Act: 





The FPGA Design Guide 


F^ 2 
t 
lic 


The FPGA Design Guide 


August 1991 











Contributors: Sam Beal, Nancy Canyon, Steve Gurklys, Dennis McCarty, Warren Miller, Bruce Weyer 


ACT, Action Logic, Activator, Actionprobe, ALES, and PLICE are trademarks of Actel Corporation. 


ABEL, CUPL, Data I/O, EPM5128, LCA, LOG/IC, MAX, MegaPAL, Mentor Graphics, OrCAD, PAL, PALASM, PGADesigner, UniSite, Valid, and Viewlogic 
are trademarks or registered trademarks of their respective manufacturers. 


Actel Corporation reserves the right to make changes to any products or services herein at any time without notice. Actel does not assume any 
responsibility or liability arising out of the application or use of any product or service described except as expressly agreed to in writing by Actel. 


© 1991 Actel Corporation 








The FPGA Design Guide 





Introduction ..... Page 1-1 
Design Prediction ..... Page 2-1 
Design Tools ..... Page 3-1 


Design Examples ..... Page 4-1 


= EN 








Ref | The FPGA Design Guide 








Section 1: 


Section 2: 


Section 3: 


Section 4: 


Introduction 

INTRODUCTION -e soea re ei a ET saken Greer АЛАХ a eb a SALE Ra 1-1 
Actel Architectural Advantages . :.:.:.: аи en en i i 1-7 
High-Density Programmable Logic Taxonomy ....................................................... 1-9 


Design Prediction 


Estimating Actel FPGA Delays .................................................................... 2-1 
Estimating Gate Capacities of Programmable Logic Оемїсе$............................................ 2-15 
Predicting Power in ACT ЕРСАв................................................................... 2-25 


Design Tools 
Converting Multiple PLD Designs to FPGAS ........................................................... 3-1 


Design Examples 


A TTL Designer's Guide to Using ЕРОАв............................................................ 4-1 
Designing Counters with the ACT 2 Агсһйөсіше...................................................... 4-5 
Implementing Fast Counters with ACT 2 FPGAS ...................................................... 4-11 
Designing Adders and Accumulators with the ACT 2 Architecture уух se 4-19 
Increase FPGA Performance Using Module/Speed Trade-Offs .......................................... 4-25 
FPGAs are Better for State Machines than PLDS ...................................................... 4-31 
Page Mode DRAM Controller ........................................................... ... ....... 4-37 
Four-Channel DMA Сопігоһег............................................................ ........ 4-39 
Using FPGAs for Digital PLL Applications ......................................................... ... 4-43 


EN 





ШЭӨСУЕУ | Introduction 


Introduction ..... Page 1-1 
Actel Architectural Advantages ..... Page 1-7 
High-Density Programmable Logic Taxonomy ..... Page 1-9 


The FPGA Design Guide 


Introduction 





You no longer have to accept the compromises and risks that 
plague traditional logic integration technologies — and hinder 
your design cycle, efficiency, and creativity. 


Traditional logic integration solutions demand that you compromise. 
If you opt for the density, performance, and design flexibility offered 
by mask-programmable gate arrays, you have to accept prototype 
turnaround times of two, four, even eight weeks. You also have to 
risk tens of thousands of dollars in nonrecurring engineering costs. 
On the other hand, if you opt for the desktop programmability and 
immediate turnaround of PLDs, you have to forgo high levels of 
integration and design flexibility. 


Actel’s No Compromise Logic Integration Solution 


Given the importance of the decision, Actel doesn’t believe you 
should be forced to make such compromises. That’s why we took a 
completely new approach to logic integration. Rather than using 
device programming technologies borrowed from memory devices, 
Actel invented a whole new solution optimized for integrated logic 
designs: the PLICE™ (Programmable Low Impedance Circuit 
Element) antifuse. 


We combined our new programming technology with a channeled 
gate array architecture — a feat made possible by the small size and 
low resistance of the Actel-invented PLICE antifuse. Together, our 
PLICE antifuse technology and programmable gate array 
architecture form the foundation of a completely new logic 
integration solution. 


Actel’s No Wait Logic Integration Solution 


High gate counts and application flexibility exhibited by channeled 
gate arrays stem from their significant interconnect resources. 
Actel currently offers up to 750,000 potential interconnect sites in 
an 8,000-gate user-programmable gate array. 


By incorporating roughly one hundred antifuses per gate of logic, 
we are able to offer a programmable device with design flexibility 
and interconnectivity comparable to a conventional masked gate 
array. What's more, ample interconnect resources and our 
channeled gate array architecture allow us to leverage years of 
algorithm development for fully automatic placement and routing. 
We've developed a solution that fully automates the physical design 
process — increasing your productivity and reducing your time to 
market. 


You can create gate array prototypes in a matter of hours. And you 
can finally afford to be innovative without fear of costly, 
time-consuming mistakes. In a nutshell, we offer you the first 
channeled gate array that lets you design, prototype, and produce. 
On the spot. At your desk. With no NRE. 


© 1991 Actel Corporation 


Actel’s System Solution 


A successful integrated logic design is possible only if you have 
access to powerful, sophisticated design automation tools. Actelisa 
firm believer in allowing the user, the specialist, to determine his or 
her preferred design tools. For this reason, Actel has developed the 
Action Logic™ System, an easy-to-use CAE environment which 
allows the designer to leverage his or her existing design tool 
expertise to quickly bring a design to life (Figure 1). 


Schematic Capture and Simulation 


Fully integrated and easy to use, ALS users enter their schematics 
and simulate designs using the ACT™ 1 or ACT 2 Macro Libraries 
and schematic capture/simulation tools from one of four popular 
suppliers: Mentor Graphics®, OrCADTM, Valid™ Logic and 
Viewlogic® Systems. The Action Logic System has full hierarchical 
support and allows user-defined soft macros to speed the design 
process. Our extensive macro libraries contain over 250 macros, 
ranging from simple flip-flops and TTL functions to larger logic 
functions, such as 16-bit counters and 32-bit fast adders. 


If your expertise lies within the PAL™ design arena, we also have a 
solution for you. ALES™ 1, Actel’s PAL/PLD mapping and 
optimization tool, allows quick and easy implementation of logic in 
Actel FPGAs, using either popular PLD design tools like ABEL™, 
CUPL™, LOG/IC™, and PGADesigner™ or the familiar 
PALASM®2 syntax It can provide Actel-specific logic 
implementation for existing files created with PLD design tools or 
netlist files generated from schematics. ALES reduces design time 
for all types of sequential logic and optimizes logic designs from 
Boolean equations, state machines, or schematics for 
implementation in Actel FPGAs. 


Validation 


The Actel Validator checks for electrical or design rule violations 
specific to the selected Actel device, such as fan-out violations and 
connectivity errors. It also reports important design information 
such as types and quantities of macros used, logic module count, 
I/O pin count, and average net fan-out. The Validator generates a 
Design for Routability (DFR) file containing buffering and I/O pin 
suggestions to facilitate design layout. For a 90% utilized A1020 
design, the process takes approximately five minutes. 


Automatic Placement and Routing 


The Actel place and route program is 100% fully automatic and 
allows user control of I/O pin assignment and critical path routing. 
The place and route program delivers a guaranteed 85% to 95% 
gate utilization on the first pass. An average design is placed and 
routed in under 25 minutes. 


Timing Analysis 


The Actel Timer determines and highlights paths specified by the 
user for timing analysis. The user selects operating conditions 
(commercial, industrial, or military) and indicates typical or worst 
case. The Timer will identify worst-case paths, analyze clock skew, 
calculate setup times, and determine maximum frequency. 


1-1 





CIS 











Schematics [ene 







Auto Pin Place & 
Assignment Route 


Delay Back Annotation 


Timing 
Analysis 





00:=A+B+C : 
Equations 


Q1.RSTF=RST 





p 
Tur 
TL IL | simulation 





Programming Verification 
& Debug 


Figure 1. Design Capture 





Programming 


The Activator™ programming unit offers one-button ease-of-use. 
Low-cost Activator 1 units program a single unit of any device from 
the ACT 1 family. Activator 2 units program up to four devices 
from the ACT 1 or ACT 2 family. Thanks to a modular design, the 
Activator 2 can program different devices or like devices with equal 
ease. Both units perform device testing during programming. 
Typical programming times are 3-5 minutes for an A1010 and 5-10 
minutes for an A1020. Third party programming support for ACT 1 
devices is available on the Unisite™ Model 48 from Data /O® 
Corporation. 


Debugging and Diagnostics 


Actel’s debugging software can be used for functional verification 
while the part is in the Activator socket or in its target system. Via a 
software interface, the user can force the state of the input pins and 
examine the response of output pins. The result is displayed 
interactively on screen. Unique to the Actel design environment 
are the Actionprobe™ diagnostic units which allow 100% internal 
real-time observability of a programmed device. Any two internal 


nodes can be connected to two dedicated pins so that their activity 
can be observed. With a single command, users can move the probe 
function to other internal nodes and observe the entire chip if they 
so desire. 


Actel's Device Architecture 


Design Flexibility 


Actel's devices are designed for flexibility, from the ground up. The 
fundamental building block of our flexible solution is the PLICE 
antifuse. This revolutionary interconnect element not only 
provides a low-impedance connection which minimizes 
Propagation delays, but due to the miniature size of the element, 
also allows exceptionally high routing resources to be distributed 
throughout our programmable gate array architecture. For 
example, a macro in an ACT array can connect to over 30% of the 
available macros by using only two antifuse connections, as shown 
in Figure 2. Over 90% of all connections use only two antifuses. As 
a result, the delay distribution is very predictable and overall 
performance is enhanced. 


Introduction 








Two-Antifuse Paths 





Figure 2. Antifuse Routing Paths 





Actel’s channeled architecture offers an abundance of vertical and 
horizontal interconnect tracks. Long and short interconnect tracks 
have been staggered in both the horizontal and vertical array 
directions. This mix of interconnects allows Actel to minimize the 
number of antifuses needed to connect any two modules. In fact, 
Actel has limited the number of antifuses per interconnect to four 
or less in both the ACT 1 and ACT 2 families. A limited number of 
low impedance interconnect elements, resulting in a very narrow 
distribution of propagation delays, allow Actel to offer the most 
predictable timing in the industry. 


This high level of routing resources allows Actel to offer enhanced 
flexibility by offering a fine granularity architecture. The fine 
granularity of our modules allows a wide range of macros to be 
developed with minimal consumption of gates. Iflarger logic blocks 
are desired, the low impedance antifuse allows multiple modules to 
be used with minimal delay penalties. This combination of fine 
granularity and high routing resources provides an excellent 
framework for automatic placement and routing tools; thus, 
offering the most routable, most efficient FPGAs in the industry. 


ACT 2 Architecture 


C-Modules and S-Modules 


The ACT 2 family offers dedicated combinatorial and 
combinatorial-sequential modules, as shown in Figures 3 and 4. 
The combinatorial module, C-Module, has been enhanced to 
implement high fan-in combinatorial macros, such as 5-input 
AND, OR, NAND, and NOR gates. Additionally, AND-OR gates, 
XOR gates, AND-XOR gates, and many other combinatorial 
functions are available. 








Up to 8-Input Function 


Figure 3. C-Module Implementation 





The combinatorial-sequential module, S-Module, has been 
optimized to implement high-speed flip-flops within a single 
module. Furthermore, S-Modules also include combinatorial logic, 
allowing an additional level of logic to be implemented with no 
additional propagation delay. 


1-3 











OUT 





Up to 7-Input Function Plus D-Flip-Flop with Clear Up to 7-Input Function Plus Latch 


O 
CLR 
OUT 
GATE 





Up to 4-Input Function Plus Latch with Clear Up to 8-Input Function (same as C-Module) 


Figure 4. S-Module Implementations 











Introduction 





Hard and Soft Macros 


Designing within the Actel design environment is accomplished 
through a building-block approach. Over 250 widely used logic 
functions are stored within our macro library. Each macro 
represents one of our basic to complex building blocks from which 
you may build your design. These macros range from simple logic 
functions such as AND gates, to more complex logic functions, such 
as 16-bit counters and accumulators. 


The macros are implemented within the Actel architecture by 
utilizing one or more C-Modules and/or S-Modules. Over 150 of 
these macros are implemented within single modules, and an 
additional 25 macros are implemented by connecting only two 
modules. One-module and two-module macros have small 
propagation delay variance, providing accurate performance 
prediction capabilities. These macros are called Hard Macros, and 
their propagation delays are specified within the datasheet. 


More complex logic functions are also included in our macro 
library. These Soft Macros are implemented by using several Hard 
Macros. The propagation delays of Soft Macros are not specified 
within the datasheet. 


Capacity 


How large is large? Only the designer really knows how large his or 
her programmable gate array should be. For that reason, Actel 
counts the density of its devices as the number of gates which the 
designer can use in his or her design, much like gates are counted in 
gate array designs. If you have a design which requires 8000 gate 
array equivalent gates, then we recommend you use our 8000-gate 
device, the A 1280. 


Gate Array PLD/LCA* 
Device Gates Gates 
ACT 1 Famlly A1010A 1,200 3,000 
A1020A 2,000 6,000 
ACT 2 Family A1225 2,500 6,250 
A1240 4,000 10,000 
A1280 8,000 20,000 


* Counted as equivalent PLD/LCA™ gates. 


Performance 


As you look closer at Actel’s ACT 1 and АСТ 2 devices, you cannot 
mistake the huge performance impact which the PLICE antifuse 
has made. No matter how fast programmable modules may be, they 
must have fast interconnects to allow performance to prosper. 
Guaranteeing four or fewer low impedance antifuses are used per 
interconnect path is one way which Actel ensures high levels of 
performance. 


Performance is also measurable at the macro design level. When 
establishing performance standards for macro designs, one must 
look closely at parameters which most truly represent expected 
performance levels of the design. The most appropriate 
performance indicator for FPGA device speeds is the 
measurement of the most commonly used macros within that 
device. Two such measures are the 16-bit accumulator and the 
16-bit binary loadable counter. These and other performance 


measurements will be covered, in depth, over the next few chapters 
of ‘The FPGA Design Guide’. 


The FPGA Design Guide 


The FPGA Design Guide has been composed so that you, our 
customer, can become more familiar with Actel’s highly flexible, 
efficient, easy-to-use design environment. The Design Guide 
addresses questions regarding the ALS environment, ACT 1 and 
ACT 2device families, the technologies behind Actel products, and 
offers sample application designs which have been implemented 
within the ALS design environment. 


The Design Guide provides useful information for making 
engineering estimates of: 


Performance — How fast will my application run? 
Capacity — How much logic will fit into an FPGA? 
Power — How much power will be dissipated? 


The papers will explain the underlying issues, provide factual data 
for exact calculations, and useful rules of thumb for estimating your 
design requirements before you actually start designing your 
FPGA. The process of converting designs comprised of multiple 
PLDs into FPGA-based designs is described in the Design Tools 
section. 


Additionally, the Design Guide gives practical device and design 
examples for using FPGAs. The following topics are addressed: 


e A TTL Designer's Guide to Using FPGAs 一 Explores the 
issues and concerns of TTL users who are thinking of 
designing with FPGAs. 


ө Designing Counters with the ACT 2 Architecture — How to 
get the fastest and most efficient counter designs using the 
ACT 2 architecture. 


e Implementing Fast Counters Using ACT 2 FPGAs — 
Another technique for creating high performance counters. 


e Increase FPGA Performance Using Module/Speed 
Trade-Offs — Explains trade-offs between performance and 
area. 


@ Designing Adders and Accumulators with the ACT 2 
Architecture — Explains how Actel devices are uniquely 
well suited to implementing fast adders. Making an adder 
into an accumulator can be done with no additional 
hardware or performance degradation. 


@ FPGAs are Better for State Machines than PLDs — Shows 
you how to design state machines applying the efficient one 
state per bit approach. 


@ Page Mode DRAM Controller — Design example showing 
how a complex memory controller can be implemented on 
Actel devices. The controller is one of the Actel 
Supermacros you can customize to save design time. 


* Four-Channel DMA Controller 一 Another Supermacro 
design example. Can be used with the DRAM controller in a 
computer system. 


e Using FPGAs for Digital PLL Applications 一 DSP 
applications are well suited for the flexible architecture of 
FPGAs. The entire design, except for the crystal, can be put 
on an FPGA. 








1-5 





1-6 


Actel Architectural 


Advantages 





Introduction 


Actel Field Programmable Gate Arrays are very well suited for a 
variety of applications. In fact, the primary strength of the Actel 
FPGA architecture is its general purpose applicability. Unlike 
some high density programmable logic devices which only 
implement a few logic functions efficiently, Actel implements all 
the most common system logic functions efficiently. This is 
possible because of the ‘fine-grain’ structure of the logic module. 
This architectural flexibility allows large logic functions to be 
efficiently implemented by putting together the appropriate small 
sized modules, eliminating the wastage associated with ‘coarser- 
grained’ programmable logic families. Speed is not sacrificed for 
this flexibility because of the extremely low impedance of the Actel 
patented PLICE antifuse. This patented low impedance antifuse 
technology is the key ingredient to making flexible and fast FPGAs 
like the ACT™ 1 and ACT 2 families. 


One of the other key architectural advantages of the Actel FPGA is 
that it is register-rich in nature and has very fast narrow gating logic 
modules (fine-grained). In comparison, PLDs generally have much 
fewer registers and wide gating logic modules (coarse-grained). 
Looking at the main types of logic functions system designers need 
is instructive in showing the overall flexibility of the register-rich, 
fast narrow gating, fine-grained Actel FPGA architecture. System 
designers categorize the types of functions they implement as 
primarily control logic (state machines), datapath logic, and random 
logic (glue). 


Control Logic 


The control logic portion of the design is usually less than 20% of 
the gate count of a typical high density design. State machines are 
the major portion of the control path. State machines are 
distributed keeping the sizes to between 8 and 16 states usually. A 
design might have from 2 to 10 of these types of small control 
sections however. 


These designs use a number of registers, counters, and random 
logic. Experienced designers take advantage of the register-rich 
and narrow gating nature of the Actel FPGA architecture and use 
more state bits than the traditional PLD designers. This results in 
faster and lower gate count designs.! Thus, а lot of registers and 
fast, narrow to medium width logic will be optimal for the control 
section. Over the next few years, the control portion of design will 
grow in density slower than the datapath portion. This means that 
more and more of the design will be datapath oriented and less will 
be control oriented. (For example, the bus size would easily go from 
16 to 32 bits to double the datapath, but control would not change 
at all!) 


Datapath 


The datapath portion of the design is usually about 40% to 60% of 
the gate count of a typical design. The major elements of the 
datapath are multiplexors (muxes), registers, and counters. 
Multiplexors are used to select the data source for downstream 
operation. The most common example is where on chip status 
registers need to be readable by the microprocessor. All these 
Tegisters need to be multiplexed to the output port. Sometimes this 
requires 16 inputs, and if the registers are 16 bits wide, 16 outputs 
are needed. Other common muxing examples are muxes on the 
input of a register to allow a load/hold (clock enable), muxes to 
select the input for an adder, muxed register or combinatorial output 
for multiplexed address/data buses, muxes to implement a register file 
(4 registers with 4-to-1 mux to select the output word), and тихе for 
bringing test points off chip. A typical design might use around 
20% to 30% of the gates for multiplexing. 


The Actel FPGA architecture implements 4-to-1 muxes in a single 
level. This makes the above-described multiplexing functions 
particularly efficient in the Actel architecture. 


Registers are another common datapath element. Examples are 
for holding parameter data (the most popular being start values for 
timer/counters), control configuration information (like in a 
UART where protocol, parity sense, and interrupt priority are 
stored values), interrupt mask registers, base memory locations, 
scratch pad memory, etc. These registers typically need global 
reset, individual reset, clock enable, readback capability, and 
usually have no logic on the input to the register. A typical design 
would probably use around 20% to 30% of the gates for this type of 
register. 


The Actel FPGA architecture is very register-rich. Almost half of 
the gate count of an Actel device can be used as registers. This 
makes the above-mentioned functions very efficient in the Actel 
architecture. The availability of combinatorial logic in front of the 
register in the ACT 2 module also allows the efficient implementation 
of common logic functions in front of the register. 


Counters account for a large portion of the typical design. These 
can be used for address generation for external RAM (graphics for 
example), delay timing (in particular in communications applications), 
event counting, and a variety of algorithmically oriented functions. 
Usually these counters need global and individual resetting, 
loading, and count enable control. Sizes range from 4-bits to 24-bits 
with the higher range being used for address generation or very 
long counters for time stamping. These types of counters would 
account for around 20% to 30% of the modules in a typical design, 
but could go as high as 50% in the memory control oriented designs 
(graphics and communications). 


SD 


© 1991 Actel Corporation 


1-7 





The critical path in most counter designs аге the fastest, least 
significant bits. The speed at which these bits operate determine 
the overall speed of the counter. These least significant bits need a 
fast narrow logic implementation to meet the critical path 
requirements. Thus, the register-rich, fast narrow gating Actel 
FPGA architecture efficiently implements this function. 


The datapath portion of the design is thus very narrow gate (mux) 
oriented and register intensive. In general these functions will 
require a number of registers and outputs, but primarily narrow 
and fast logic functions in front of the register. 


Random Logic 


Random logic usually makes up the remainder of the design. 
Asynchronous registers, generic AND/OR/XOR gates, latches, 
multiplexors, decoders, and other 7400 sounding names are all 
used. As the name implies, this type of logic requires random 
interconnect and has the largest fan-out variation (from 1 to 100 is 
possible). Fast narrow gating is usually required since these 
functions tend to be very simple. Fine-grain logic implements these 
best with little wasted gates (high utilization). Interconnect will be 


very random and a high speed, easily routed architecture will be 
required. 


Actel FPGAs meet the requirements of random logic functions 
exceptionally well. The fast narrow gating, and high speed, high 
fan-out capability of the interconnect make random logic a very 
natural fit for Actel FPGAS. 


Conclusion 


Actel FPGAs are efficient at all the major logic functions system 
designers require. In particular, datapath functions (registers, 
multiplexing and counting), control logic functions (state 
machines), and random logic functions are all exceptionally suited 
for the register-rich, fast narrow gating, fine-grained Actel FPGA 
architecture. This flexibility is attributed to the Actel patented 
PLICE low impedance antifuse technology. 


1.) “FPGAs are Better for State Machines than PLDs”, The 
FPGA Design Guide, Actel Corporation, Sunnyvale, CA 
(August 1991) 





High-Density 


Programmable Logic Taxonomy 





High-density programmable logic devices can be easily categorized 
by looking at the architecture of the device and the technology 
utilized to fabricate the device. The resulting ‘Family Tree”, allows 
every existing device to be categorized, and some non-existent 
devices to be hypothesized. The first step in classification will be to 
determine the architecture of the device. Architecture will consider 
two characteristics: logic structure and interconnect structure. 


Logic structure can be AND/OR array oriented (e.g. PLA or PAL®) 
where the AND-terms or OR-terms are either programmable or 
fixed. PLAs have both terms programmable, while PALs have only 
AND-terms programmable. Notice if AND-terms are fixed and 
OR-terms are programmable you get a PROM. If the logic 
structure is not AND/OR array oriented, it may be MUX based 
(ACT™ from Actel), Table-Lookup based (LCA™ from Xilinx) or 
Simple-Gate based (ERA from Plessey). Many other logic 
implementations are possible, leaving a number of options open 
for further research. 


Interconnect structure can either be full interconnect 
(MegaPAL™ from MMI, PROM) or partial interconnect. Partial 
interconnect can be done in a distributed fashion, using channeled 






CENTRALIZED 
ROUTED 


routing for example (ACT from Actel, LCA from Xilinx) or in a 
centralized fashion (Switch Matrix in MACH from AMD, PIA in 
MAX® from Altera). A wide spectrum of interconnect is actually 
possible, using a hierarchy of local and global connectivity. This is 
also an area for future research. 


The technology aspect of our taxonomy relates to the device used 
to control the programming of the device. EPROM, EEPROM, 
SRAM, fuses, and antifuses are all potential technologies for 
implementing the necessary programming elements. Presently 
only a single technology is used on a device, but it’s possible to 
hypothesize a High-Density Programmable Logic product which 
would use a mixture of technology (SRAM with EEPROM for 
nonvolatility for example). 


Once these attributes are combined, a single ‘Family Tree’ as shown 
in Figure 1 results. The initial branches are architecture based with 
the final branches technology based. Most existing devices are 
indicated, with some ‘extinct’ products also listed. Empty spots are 
potential areas for new products, or may be vacant because they are 
not competitive. 





CENTRALIZED 


ANTIFUSE 











PARTIAL 


CENTRALIZED 


PARTIAL 


Figure 1. High-Density Programmable Products Famlly Tree 








© 1991 Actel Corporation 





1-9 


CES) 








Design Prediction 


Estimating Actel FPGA Delays ..... Page 2-1 


Estimating Gate Capacities of Programmable .... Page 2-15 
Logic Devices 


Predicting Power in ACT FPGAs .... Page 2-25 


The FPGA Design Guide 





Estimating 


Actel FPGA Delays 





Introduction 


You must determine internal, chip level, and system level delays in 
order to predict the performance of Actel Field Programmable 
Gate Array based circuits. We will take you step by step through the 
details required to determine these delays in this application note 
as follows: 


e A simple model is described that includes Actel internal 
delays. 


@ These internal delays are then evaluated in order to 
estimate the chip and system level performance — input, 
output, and system clock delays, internal combinational 
component delays, internal sequential component delays, 
internal routing delays, and internal combinational block 
delays. 


@ We next estimate the chip level timing — AC characteristic 
delays, synchronous clock mode delays, then the input and 
setup hold times. Here, timing characteristics are modeled 
so you can evaluate the worst case paths and make sure the 
Actel devices meet your speed requirements. 


e Finally, we calculate the system level performance of Actel 


to Actel transmissions, Actel to PAL® transmissions, and 
Actel to TTL transmissions. 








CLOCK | > 















їн 
tw 
COMBINATORIAL to 
INPUT | > LOGIC 


tcoms 


SEQUENTIAL 
ELEMENTS 


tsu 


fera. tere 


You should be familiar with the basic functions and architecture of 
the Actel Field Programmable Gate Arrays (FPGAs) described in 
our databook!. Although Actel typically specifies internal gate 
delays as a function of the loading or fan-out of the gate, we will 
take the traditional PAL/TTL approach and will deal with logical 
gate delays and routing delays as separate issues. Typical PAL 
product terms will be built up from basic Actel building blocks and 
the chip level timing estimations based on a product term model of 
the Actel device. That's not to say that FPGAs and PALs are 
functionally equivalent. In fact, Actel FPGAs are much more 
flexible than PALs, and you are not limited by product terms, 
registers are essentially unlimited, and you can connect any 
internal gate to any input or output. 


In the timing model in Figure 1, all of the component delays are 
within the blocks, and routing delays are between each block. 


Three speed versions of ACT™ 1 FPGAs are available: standard 
speed A1000s, A1000-1 devices that are 15% faster, and A1000-2 
devices that are 25% faster. Unless otherwise noted the timing 
parameters in this application note are given for worst case 
commercial conditions. You can refer to the Actel databook to get 
derated delay values for industrial and military temperature ranges. 
The components of the timing model in Figure 1 are analyzed in 
the section that follows. 

















OUTPUT 
OUTPUT 
COMBINATORIAL or 
LOGIC . vo 
e 
tcoma OUTPUT 





FEEDBACK (NO INHERENT DELAY) 


Figure 1. Actel FPGA Timing Model 








ーー 


© 1991 Actel Corporation 


2-1 


CTS 





Evaluating Internal Delays 


We will partition the circuit into blocks and then evaluate the 
delays. First let us determine the input, output and system clock 
delays, the internal combinational delays, sequential component 
delays, internal routing delays, long track delays and internal 
combinational block delays. Keep in mind that the component 
delay numbers presented here DO NOT include routing delays. 
Routing delays are added when connecting elements together. 
This traditional approach differs from the one described on Actel 
datasheets, but both methods will yield the same chip level timing. 


Input Delays 













Змін 


іні. 


Clock Delays 


terao 6.8 10.2 
ОД 102 | 136 
13.6 17.0 











feLkazo 


Output Delays 


touL 
{отн 
toumz 
tourm 
tounz 
tourz 


A1000-2 A1000-1 TEEN 000 


| max | | MAX 
Br 


1.4 3.8 1.4 
2.0 5.8 2.0 






The first step is to analyze the delays that result from getting onto 
and off of the FPGA. A data signal must pass through an Input or 
Bidirectional buffer to get onto the Actel device. Then, to exit the 
device, the data signal must pass through an Output, Tri-State or 
Bidirectional buffer. Clock signals must go through the Clock 
buffer to make use of the dedicated on-chip clock network. The I/O 
and system clock delays are presented in Figure 2. Again, all of the 
delay values in this note are for commercial operating conditions 
and are within processing variations. Equations are provided 
within each figure so you can calculate the delays for your circuits in 
other conditions. 









4.6 
6.9 


77 9.3 13.9 
13.9 18.5 
15.4 18.5 23.1 















Voc = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case processing. 
Note 1: Assumes 50 pF CMOS load on outputs, refer to Actel databook for additional derating. 


Figure 2. Input, Output, and Clock Buffer Delays 








Input Delays 


Unlike most PALs, the delay through the standard input buffer 
(INBUF) is identical to the delay through the bidirectional input 
buffer (BIBUF). 


Output Delays 


The delays on the output buffers are a function of the load 
capacitance on the output pins and the delays for any of the output 
paths 一 OUTBUE TRIBUE or BIBUF - are identical. These 
output delays are specified by Actel for a standard SOpF load. The 
output capacitance can be extrapolated down to a minimum of 
15pE 


Clock Delays 


ACT 1 family FPGAs contain a global clock network which is 
accessed via the Clock Input buffer (CLKBUF). The delays on this 
buffer are a function of the number of sequential elements loading 
the clock network and represent the delay from the clock input pin, 
through the Input buffer and through the clock network, right up to 
the input of the flip-flop. 


As with traditional PAL or TTL designs, you must also take care to 
deal with timing skew on the clock input signal. Clock skew on the 
network has been accounted for in Figure 2 by including MAX 
WORST and MAX BEST timing numbers. 











Estimating Actel FPGA Delays 


ms ——_——————————_____r___—————ttttt—_—_—€@—6Tî e 6— “T*"R#tlth*tr"“°re}5Mm<-.—_ 


Internal Combinational Component Delays 


Before you can calculate the combinational delay through a block 
oflogic, you must first determine the number of levels of Actel logic 
modules that it will take to realize a specific function. You can find 
descriptions of all of the components that will map into a single or 
dual logic module in the Actel databook. Logic diagrams of four 
single module macros and four two-module macros are shown in 
Figure 3, and the delays for both types are shown in Figure 4. 


A single Actel logic module can accommodate many four-input 
functions. Larger combinational functions can be constructed by 
combining logic modules. 


Internal Sequential Component Delays 


There are two types of sequential elements in the library — 
Latches and Registers. The Actel Databook lists all of the available 
sequential elements. The Register delays are shown in Figure 5. 





SINGLE LOGIC MODULES 





DUAL LOGIC MODULES 


Figure 3. Sample Actel Logic Modules 











Single Logic Module (3 ns typical) 


PARAMETER 





Dual Logic Module (3.9 ns typical) 


A1000-2 A1000-1 А1000 UNITS 


(соме! 4.6 
tcomaz 6.0 





Figure 4. Combinatlonal Delays 





EN 
————— c. 















PARAMETER A1000-2 A1000-1 A1000 UNITS 
tco 3.4 3.8 4.6 


Figure 5. Register Timing 


Internal Routing Delays 


When estimating path delays within an Actel FPGA, the internal 
routing delays must be included between every component along a 
path. There is an inherent connection delay (similar to a PC run 
delay) between every component in the device. 


There are three types of delays to consider on an Actel FPGA: 
Critical Net delay, Typical Net delay, and Long Net delay. A 





Critical Net delay is determined by specifying that net as critical 
before placing and routing your design. It will then be placed and 
routed by the CAD system to have the lowest possible delays. 
Ninety percent of the net delays will fall under the value specified as 
the Typical Net delay. The Long Net delay is an extreme worst case 
delay that may occur ona few of the nets in a design. Figure 6 shows 
these types of delays as a function of the internal fan-out. 





A1000-2 


A1000-1 


ME Sm шлш ш" 


Routing Delays 


tert: 





Vcc = 4.75 to 5.25 V Ta = 0°C to 70°C, best-to-worst case processing. 


Figure 6. Routing Delays 








2-4 








Internal Combinational Block Delays 


One of the advantages of the Actel architecture is that 
combinational functions can be as complex or as simple as is 
needed. For example, product terms can be of any size. With PALs, 
you can determine to the nanosecond how long the delay will be 
through a combinational block; this is not the case with FPGAs. 
However, you can estimate delays for combinational product terms 
before the design is implemented. 


When you look through the Actel library you will see that a 
four-input OR gate is a single logic module. Any size AND or OR 
gate can be created by cascading 3 or 4 input gates as shown in 
Figure 7. 


The timing estimations for the small, medium, and large blocks of 
combinational logic in Figure 7 were made by including the routing 
delays on the input and output of the logic block in the product 
delay calculations. The equations used to calculate the estimated 
delays are shown by each product term. Actel FPGAs are designed 
so that it is often possible to make the critical paths through a 
product term function go through lower levels of logic. Using the 
lowest possible logic level whenever possible will allow a state 
machine to run at high frequency. 


Chip Level Timing 


Once you’ve identified the critical paths in a design and 
represented those paths in terms of Actel logic module delays, you 
are ready to calculate the chip level timing parameters. These logic 
module delays can be broken down into: 


• AC characteristic or unclocked path delays. 


• Synchronous clock mode delays —clock-to-q, setup and hold 
times, the maximum frequency when using the global clock 
buffer. 


We will now analyze the representative critical paths in Figure 8. 
All of the output buffers in these examples are assumed to be under 
a 50 pF TTL load. The AC characteristics reflect potential delays 
incurred on signals passing unclocked through the FPGA. As you 
can see in Figure 8, the maximum on/off time is 23.9 ns for the 
A1000-2. 


Maximum Frequency 


The synchronous clock mode is presented in Figure 9. The delays 
are calculated assuming that all sequential elements of the paths 
under consideration are clocked off of the global clock network. 


The maximum pipelined frequency occurs during a simple register 
to register transfer. A simple logic function could be performed at 
this maximum transfer rate by taking advantage of an Actel 
flip-flop with a 2:1 multiplexor (Figure 9, Case 1). 


A typical state machine implementation (Figure 9, Case 2) might 
require logic with 4 product terms of six-input AND gates between 
flip-flops. This translates into an operating frequency of 
approximately 20 MHz for the standard speed grade FPGA. When 


Estimating Actel FPGA Delays 


expressed in terms of Actel logic module delays, this state machine 
example is rendered in three levels of logic between flip-flops with 
four input signal loads. 


There are also cases where more complex logic is required. Indeed 
most PAL architectures now offer 8 product terms with 16 inputs 
per product term. This translates into four levels of Actel modules 
between registered elements. 


Larger functions than those in Figure 9 can be built because there is 
no limitation on the number of product terms per register. You 
need to be sure to determine which input signals are the timing 
critical logic terms. In most cases you’ll be able to arrange the logic 
so that they go through the fewest number of logic levels. 


Since there is no inherent feedback delay on an FPGA, a net delay 
depends only upon how many gates it is driving and on how far 
apart the gates are physically on the device. This means the 
maximum frequency with feedback analysis will be the same as the 
maximum frequency without feedback. 


Input Setup and Hold Times and the System CLK 


Input Setup/Hold Times, with respect to the active system clock edge, 
are also important synchronous operation timing parameters. Three 
Setup/Hold Times calculations are discussed below. In Figure 10, Case 
1, the data is registered right after entering the Actel device to attain 
the lowest setup time possible. You can see in the figure that the 
Actel architecture allows you to come on to the FPGA and go 
directly into a flip-flop. With PALs you must go through the 
product term and into the flip-flop, even if you’re not using the 
product term. 


In many instances data must be gated before registration as in 
Figure 10, Case 2. You can see that the hold times in this example 
are all negative. This means that the data stream can be severed 
before the clock edge, and data will still be clocked correctly. 


In the third example (Figure 10, Case 3), data is input to a state 
machine, and then goes through a four-product term function 
before it is registered. Once again the data hold time is negative 
and data is only valid for about 15-25 ns depending on the part’s 
speed grade. 


Clock High to Output Valid 


The clock to output valid time is another important timing 
parameter. In most of the smaller PALs, the flip-flop output goes 
directly to an output pin, and cannot be gated prior to going off 
chip. To attain the lowest clock-to-q time, limit the fan-out on the 
flip-flop and minimize the loading on the clock buffer, as in Figure 
11, Case 1. 


In larger scale designs, the clock loading is typically around 160 (80 
flip-flops or 160 latches) with the output flop driving gates in 
addition to the output buffer. 


Actel’s flexible architecture means that the output does not have to 
go directly from a flip-flop, but can be gated by as many levels of 
logic as required. 








Estimating System Level Performance 


After estimating chip level timing parameters, system level 
performance can be estimated. System level performance depends 
on the ‘message passing’ constraints (FPGA to FPGA, FPGA to 
PAL, TTL to FPGA) imposed by the FPGA. For example, in the 
Actel to Actel message transmission example in Figure 12 you can 
expect the maximum FPGA to FPGA propagation times to be 
between 16 to 27 MHz depending upon the internal device loading 
and speed grade. 


An Actel to PAL transmission example is shown in Figure 13. The 
PALs in the circuit are 15 and 25 ns devices. You can expect the 
emerging 5 ns PALs to slash Actel to PAL transmission times 
significantly. 


Finally, the Actel to TTL transmission example in Figure 14 can be 
run at from 20 to 30 MHz. This high transmission rate is due in part 
to the short setup time of TTL F series flip-flops. 


You now should be able to proceed in a similiar fashion to the 
procedures we described here and estimate the internal, chip level, 
and system level delays of your Actel based circuits. 


List of Figures 


Figure 1. Actel FPGA Timing Model 

Figure 2. Input, Output, and Clock Buffer Delays 
Figure 3. Sample Actel Logic Modules 

Figure 4. Combinational Delays 

Figure 5. Sequential Delays 

Figure 6. Routing Delays 

Figure 7. Product Term Delays 

Figure 8. Three AC Characteristic Examples 
Figure 9. Three Maximum Frequency Examples 
Figure 10. Input Setup/Hold Time Examples 
Figure 11. Three Clock-to-Q Examples 

Figure 12. Actel to Actel Message Transmission 
Figure 13. Actel to PAL Message Transmission 
Figure 14. Actel to TTL Message Transmission 


1.) ACT Family Field Programmable Gate Array Databook, Actel 
Corporation, Sunnyvale, CA (March 1991) 


Estimating Actel FPGA Delays 












CASE 1: One Product Term 


| Min | мах | mın | wax мін | МАХ 


3.4 | 10.8 | 3.4 | 12.2 | 3.4 | 14.6 
11.2 | 27.6 | 11.2 | 31.2 | 11.2 | 37.7 
15.1 | 37.2 | 15.1 | 42 15.1 | 50.8 


Vcc = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case 
processing. 








Tpaopi 





Tpropi = Түрі + (соме! + Нур 


CASE 2: Four Product Terms of 6 Input ANDs 






Трвоса = Мура + З*Їсомв1 + Нур 


CASE 3: Eight Product Terms of 16 Input ANDs 


ef > 

D 

sus А, 

SID 
D 


]1-3* 
К В 
У 
А BILD 


O 
оо 
OO} 
< 


с] у" 


lo 
0000 
ojojo 
< 


Tprone = Мура + 4*tcomaı + trypi 


Figure 7. Product Term Delays 




















ше | uec 


Tpoı 6.6 | 25.3 | 6.6 | 27.7 
14.4 | 46.2 | 14.4 | 52.7 
18.3 | 60.0 | 18.3 | 68.8 


Voc = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case 
processing. 


CASE 1: 





OUTBUF 
PAD 





Troi = tin + Tprooı + tour 


CASE 2: 





OUTBUF 


Тым = tin + Tenopa + tour 


CASE 3: 





Tpos = tin + Tprops+ tour 


Figure 8. Three AC Characteristic Examples 








2-8 


Estimating Actel FPGA Delays 











5 OL UNITS 
pm pere nu е" 


Fmaxı 53.8 
FMAx4 
FMAxa 


Voc = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case 
processing. 











CASE 1: 





SYSCLK 


Fuaxi = cgi + Курт + tsu) 


CASE 2: 






SYSCLK 


Рмаха = 1/(tcoi + Трвора + tsu) 


CASE 3: 








DATAOUT 


SYSCLK 


Fmaxa = 1/(tco1 + Tpropa + tsu) 


Figure 9. Three Maximum Frequency Examples 











2-9 


















САЗЕ 1: 


[м [wex | [wax ww (мах 


87 
13.8 
32.0 


Veg = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case 
processing. 










Tsui = twat + trypi + tsu - toLkaoMAXBEST 
Thi = tcukaowaxwonsr - (мин + terri) 





CASE 2: 





Tpropi 


Tsuz = мне + Tprooı + tsu - (Сиавомахвезт 


Tua = tcukteoMawonsr - (мч + Tenoo:) 


CASE 3: 





Tsus = tin. + Tpaon4 + tsu - 1оцезгомахве5т 


Tha = touxszomaxworst - (tint + Terona) 


Figure 10. Input Setup/Hold Time Examples 








Estimating Actel FPGA Delays 





eta e 





Veg = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case 
processing. 


CASE 1: 






CLKBUF 


Tear = tcLkao + tea + trpi + іолін 


CASE 2: 






CLKBUF 






Тоог = teıkıso + tog + Куру + touriH 


CASE 3: 


OUTBUF 
D PAD 





CLKBUF 


Toas = torkazo + tea + trpi + touTiH 


Figure 11. Three Clock-to-Q Examples 








EN 
=P TA 



















ERES 












Veg = 4.75 to 5.25 V, ТА = 0°C to 70°C, best-to-worst case 
processing. 


CASE 1: 





Actel Dev #2 





Actel Dev #1 


2 ns PCB DELAY 





CLKBUF 


РАР NQ SS 





5 ns MAX PCB SKEW 


Fcasei = 1/(Тсо1 + РСВреду + PCBsxew + Tsui) 


CASE 2: 










Actel Dev #1 Actel Dev #2 


2 ns РСВ 


SYSTEM CLOCK 5 ns MAX PCB SKEW 


Редзег = М(Тсог + РСВовдү + PCBskew + Tsua) 


Figure 12. Actel to Actel Message Transmission 








Estimating Actel FPGA Delays 























EI 





Vec = 4.75 to 5.25 V, Ta = 0?C to 70?C, best-to-worst case 
processing. 


CASE 1: 







Actel Dev #1 







2 ns PCB DELAY 


Tsupa = 15 ns 





5 ns MAX PCB SKEW 


FcAsgi = 1(Tcor + PCBoeu + POBsxew + Торду) 


CASE 2: 





TI 20R8-25 










Actel Dev #1 








2 ns PCB DELAY 


Tsupa = 25 ms 






CLKBUF 


PX 






PAD 






5 ns MAX PCB SKEW 


Foasez = 1/(Тоог + PCBoeray + PCBsxew + Tsupai) 


Figure 13. Actel to PAL Message Transmission 

















mor [aus ons ur un uer Шы 





Voc = 4.75 to 5.25 V, Ta = 0°C to 70°C, best-to-worst case 
processing. 


CASE 1: 










Actel Dev #1 TI 74F74 


2 ns PCB DELAY 


5 ns MAX PCB SKEW 


Foaseı = 1/(Тсо1 + PCBpguy + РСВзкем + Tsum) 


САЗЕ 2: 










Actel Dev #1 ТІ 74F74 


2 ns PCB DELAY 


5 ns MAX PCB SKEW 


Fcase2 = 1/(Тсаг + РСВреду + PCBskew + Tsum) 


Figure 14. Actel to TTL Message Transmission 








Estimating Gate Capacities 


of Programmable Logic Devices 





Introduction 


Engineers can’t be blamed for being confused about the logic 
capacity of FPGAs. They have to sort through all sorts of inflated 
gate capacities claims when all they really want to know is how to 
tell if a design will fit on a device. 


We'll explain how Actel counts gates for different types of devices. 
Then, we’ll show you how to estimate the fit of a specific design for 
the different device architectures available. Our focus will be on 
Logic Cell Arrays (LCAs™), EPLDs, and the ACT™ 1 and ACT 2 
families. For each device type, we will provide design examples to 
justify our conclusions about counting gates. 


First, let's briefly explore the subject of masked gate array capacity 
since it is the basis for understanding the capacity of field 
programmable devices. 


Calculating Device Gate Capacity 


Gate Array Capacity 


Masked gate arrays are specified to contain a number of two-input 
NAND gates. Utilization of the gates is a measure of the 
percentage of the total number of gates on the device that a typical, 
design may use before running out of routing resources. 


Table 1. LCA Design Gate Counts 


LCA Capacity 


The LCA architecture is made up of rows of Combinatorial Logic 
Blocks (CLBs) separated by routing tracks and switch matrices for 
making connections. Each CLB contains two dedicated flip-flops 
and a combinatorial function generator. The function generator is 
a look-up table that can implement (on the XC3000 family) any 
fine-input function and many combinations of four-input functions. 


In a masked gate array the logic building block is a two-input gate 
and the utilization of the device is the percentage of gates that can 
be connected. In an LCA the granularity of the building block is the 
much larger CLB. The utilization of an LCA is a question not only 
of the percentage of CLBs that can be connected, but also of the 
number of gates available in the CLB that are used to implement 
logic. 


The results of analyzing six randomly selected XC3000 series 
designs may be seen in Table 1. The number of gates utilized per 
CLB ranges from just over 7 to 11.29 with the average being 8.55. 
The average represents the number of gates you may expect to 
utilize from each CLB. Table 2 shows the results from a Xilinx study 
of the gates in a CLB for both the XC3000 and XC4000 series parts. 











CLBS Combinatorlal Total Fraction of 2 Total Combo/ Gates 
Design FF Gates Used Gates Gates FF Used Total CLB per CLB 
Design 1 228 59 187 415 0.65 3.19 7.09 
Design 2 144 34 123 267 0.71 3.62 7.85 
Design 3 448 104 372 820 0.71 3.58 7.88 
Design 4 192 44 149 341 0.73 3.39 7.75 
Design 5 594 97 319 913 1.02 3.29 9.41 
Design 6 1136 136 400 1536 1.23 2.94 11.29 

Average 0.84 9.33 8.55 

Table 2. LCA Benchmark Gate Counts XC3000 and XC4000 

Total Number Gates Number Gates 
Design Gates XC3000 CLB per CLB XC4000 CLB per CLB 
ARITHMETIC 295 28 10.54 20 14.75 
TIMER COUNTER 248 28 8.86 18 13.78 
DATA PATH 157 20 7.85 12 13.08 

Average 9 14 


The LCA architecture is flip-flop intensive. There are a relatively 
large number of flip-flops compared to the combinatorial resources 
available. It has been shown that the gate utilization of LCA 
devices is strongly correlated with flip-flop usage". 


The total number of usable gates for a device equals the number of 
gates per CLB times the number of CLBs that can be connected 
together. The latter term depends on the routing resources 
available on the device. For most LCA designs, the larger the 





© 1991 Actel Corporation 


Je EN 








device, the lower percentage of its CLBs can be routed. Table 3 
shows the total number of gates you can expect to utilize from 


Table 3. Тургса! Capacity of (СА Devices 


XC3000 and XC4000 LCAs as a function of the expected values of 
gates per CLB and usable CLBs per device. 





Typical Gates Total 
Device CLBs per CLB Gates 
XC3020 64 8.55 547 
XC3030 100 8.55 855 
XC3042 144 8.55 1231 
XC3064 224 8.55 1915 
XC3090 320 8.55 2736 
XC4005 196 14 2744 
XC4010 400 14 5600 


EPLD Capacity 


Large EPLDs from vendors such as Altera and AMD have an 
architecture similar to a group of PALs tied together by acommon 
routing resource. Like LCAs, these devices have a large logic 
building block (sometimes referred to as a macrocell) consisting of 
combinatorial logic driving a flip-flop. Unlike LCAs, EPLDs have 
enough routing resources to achieve high (over 95%) utilization of 
the macrocells for most designs. The results of measurements of 
the capacity of the Altera EPM5128™ for several types of logic 
may be seen in Table 4. 


The EPLD architecture is fixed and combinatorially intensive. 
There are a large number of combinatorial resources driving a 
limited number of flip-flops. Designs that fit the architecture well 
will get good gate utilization from the device. 


Table 4. Benchmark Gate Counts for EPM5128 











Total Designs Gates 

Design Gates per Device per Device 
ARITHMETIC 295 5 1475 
TIMER COUNTER 248 6 1488 
STATE MACHINE 153 11 1683 
DATA PATH 157 9 1413 
Average 1515 





ACT Capacity 


The ACT devices have a small logic module with the low 
granularity of a gate array as well as sufficient routing resources 10 
achieve high module utilization. ACT 1 devices have a single 
module type, while ACT 2 devices have two types of modules. 


Both ACT 1 and ACT 2devices have been tested for capacity using 
the same four benchmarks as for the EPLD. The results appear in 
Table 5. Customer designs have also been used to analyze ACT 1 
module gate utilizations. Table 6 shows that, on average, ACT 1 
modules implement 3.22 gates per module for all types of logic. The 
mixture of gates to flip-flops in a design does not affect the gate 
utilization of ACT 1 devices. 


Device Usable Advertised 
Utilization Gates Gates 
0.90 492 2000 
0.85 727 3000 
0.80 985 4200 
0.75 1436 6400 
0.70 1915 9000 
0.90 2469.6 5000 
0.80 4480 10000 


Estimating Capacities for a Specific Design 


If you have a design implemented in TTL or a similar discrete 
technology or if you know what logic a design would consist of, then 
it is not difficult to estimate the logic resources required for each 
type of device to make the design. Once the number of logic 
building blocks are known, it is simple to estimate the device 
requirements considering building block utilization. 


The easiest approach is to divide the logic of the design into macros 
and random logic. Macros are higher level functions such as 
counters, adders, and decoders. Random logic consists of gates and 
flip-flops that are not organized into a commonly classifiable 
function. Once the design has been divided, the resource 
Tequirements for each part may be calculated as described below 
and the results summed for the total device requirement. 


TTL and Macro Logic 


Table 7 lists many popular TTL functions along with the number of 
logic building blocks required to implement them on each of the 
different device types. To estimate the requirements for a design 
converted from TTL, sum the entries from the table that match or 
reasonably match components from your design. For designs that 
are not in TTL, or for TTL functions that don’t appear in Table 7, 
the components can be considered as generic macros. 


When the macros are identified, they may be arranged according to 
function with all sizes of the same function (e.g. four-bit counters 
and eight-bit counters) grouped together. For each instance of a 
macro, the percent resource requirements for each architecture 
may be found in Graphs 1 through 4. 


PALs 


Estimating PAL requirements involves the use of Table 8 for each 
PAL output. The outputs may be calculated individually for each 
PAL in the design. 


To use the table, first estimate the average number of variables in 
the AND terms and count the number of terms feeding the output. 
The table shows the percentage of resources required for the 
different devices. Repeat the procedure for each output of all the 
PALs in the design and sum the results. 


-16 


Estimating Gate Capacities of Programmable Logic Devices 





Table 5. Actel Benchmark Gate Counts for ACT 1 and ACT 2 




















Total Instances Gates Instances Gates 
Design Gates A1020 A1020 A1280 A1280 
ARITHMETIC 295 6 1770 18 5310 
TIMER COUNTER 248 6 1488 26 6448 
STATE MACHINE 153 9 1377 41 6273 
DATA PATH 157 12 1884 54 8478 
Average 1630 6627 
Table 6. Actel Design Gate Counts 
Combinatorial Total Modules Gates 
Design FFs FF Gates Gates Gates Used per Module 
Design 1 24 144 116 260 99 2.63 
Design 2 143 672 846 1518 512 2.96 
Design 3 171 835 461 1296 451 2.87 
Design 4 162 962 573 1535 436 3.52 
Design 5 42 252 629 881 220 4.00 
Design 6 22 132 131 263 98 2.68 
Design 7 127 714 622 1336 477 2.80 
Design 8 178 1103 487 1590 549 2.90 
Design 9 127 740 834 1574 469 3.36 
Design 10 84 356 412 768 172 4.47 
Average 3.22 
Random Logic LCA 


ACT 1 and ACT 2 


Determining the requirements for random logic is straightforward 
for ACT 1 and ACT 2 devices because of their low granularity. 
Follow Equations 1 and 2 to determine the module requirements 
for random logic and add it to the number of modules needed for 
the macros. 


ACT 1 Modules = (FF * 2) + (Gates/3.2) (1) 
АСТ 2 Modules = (FF) + (Gates/3.4) (2) 


Equation 1 states that the number of ACT 1 modules needed to 
implement the random flip-flops is equal to the sum of the number 
of flip-flops times two and the modules for the combinatorial 
functions the number of gates divided by 3.2. 


EPLD 


The architecture of EPLDs dictates a different type of calculation. 
A random flip-flop in an EPLD will consume one macrocell 
because there is only one flip-flop available. Combinatorial 
functions, unless they drive an output, can usually be put 
somewhere in a macrocell AND-OR array or an expander. Since 
flip-flop availability is usually the limiting factor, random gates can 
usually be ignored. 


For LCAs it is more difficult to calculate the random logic 
requirements. That is because it is not obvious how random gates 
and flip-flops may be grouped to most efficiently take advantage of 
the CLBs. If time allows, it’s possible to review a design manually 
grouping random logic to see how it could be fit in a number of 
CLBs. An easier approach is to separately sum the total number of 
random gates and flip-flops and calculate the number of CLBs that 
would be required to hold them based on analysis of previous LCA 
designs. 


CLB resources include dedicated flip-flops and combinatorial logic 
(function generator). Unless the design matches the need for each 
of the two types of resources evenly, it is likely that a design will 
require more of either the flip-flops or the function generators. As 
may be seen in Table 1, on average .84 of the two flip-flops on a CLB 
are used and the function generator implements about 3.3 gates. 


To find the number of CLBs needed for the random flip-flops, 
multiply the number of flip-flops by .84. The number of CLBs 
required for combinatorial functions equals the number of random 
gates divided by 3.3. The larger of these two results should be used 
as the number of CLBs required for the random logic in the design. 


1.) McCarty, Dennis, “Interpreting FPLD Gate-Density Data”, 
High Performance Systems, 1990 Programmable Logic 
Design Guide, pp. 14-20 





2-17 








Table 7. TTL Functlon Logic Resource Requirements 





очолњоћ = 





TTL 


74131 
74137 
74138 
74139 
74151 
74153 
74157 
74161 
74164 
74166 
74168 
74169 
74174 
74175 
74190 
74191 
74192 
74193 
74194 
74195 
74269 
74273 
74280 
74373 
74377 
74378 
74379 
74674 


А1010 


4.4% 
41% 
37% 
14% 
1.7% 
3.1% 
0.3% 
71% 
5.8% 
18.8% 
12.5% 
6.4% 
9.4% 
2.7% 
13.6% 
13.6% 
13.6% 
13.6% 
47% 
3.4% 
17.6% 
5.4% 
3.1% 
2.7% 
5.4% 
4.1% 
1.4% 
16.9% 


A1020 


2.4% 
2.2% 
2.0% 
0.7% 
0.9% 
1.7% 
0.2% 
3.8% 
3.1% 
10.1% 
6.8% 
3.5% 
5.1% 
1.5% 
7.3% 
7.3% 
7.3% 
7.3% 
2.6% 
1.8% 
9.5% 
2.9% 
1.6% 
1.5% 
2.9% 
2.2% 
0.7% 
9.2% 


XC3020 


6.3% 
6.3% 
6.3% 
3.1% 
7.8% 
4.7% 
1.6% 
9.4% 
7.8% 
9.4% 
15.6% 
10.9% 
4.7% 
3.1% 
14.1% 
14.1% 
15.6% 
15.6% 
10.9% 
4.7% 
31.3% 
6.3% 
4.7% 
6.3% 
6.3% 
4.7% 
3.1% 
14.1% 


XC3030 


4.0% 
4.0% 
4.0% 
2.0% 
5.0% 
3.0% 
1.0% 
6.0% 
5.0% 
6.0% 
10.0% 
7.0% 
3.0% 
2.0% 
9.0% 
9.0% 
10.0% 
10.0% 
7.0% 
3.0% 
20.0% 
4.0% 
3.0% 
4.0% 
4.0% 
3.0% 
2.0% 
9.0% 


XC3042 


2.8% 
2.8% 
2.8% 
1.4% 
3.5% 
2.1% 
0.7% 
4.2% 
3.5% 
4.2% 
6.9% 
4.9% 
2.1% 
1.4% 
6.3% 
6.3% 
6.9% 
6.9% 
4.9% 
2.1% 
13.9% 
2.8% 
2.1% 
2.8% 
2.8% 
2.1% 
1.4% 
6.3% 





XC3064 


1.8% 
1.8% 
1.8% 
0.9% 
2.2% 
1.3% 
0.4% 
2.7% 
2.2% 
2.7% 
4.5% 
3.1% 
1.3% 
0.9% 
4.0% 
4.0% 
4.5% 
4.5% 
3.1% 
1.3% 
8.9% 
1.8% 
1.3% 
1.8% 
1.8% 
1.3% 
0.9% 
4.0% 


XC3090 


1.3% 
1.3% 
1.3% 
0.6% 
1.6% 
0.9% 
0.3% 
1.9% 
1.6% 
1.9% 
3.1% 
2.2% 
0.9% 
0.6% 
2.8% 
2.8% 
3.1% 
3.1% 
2.2% 
0.9% 
6.3% 
1.3% 
0.9% 
1.3% 
1.3% 
0.9% 
0.6% 
2.8% 


ЕРМ5128 


8.6% 
6.3% 
6.3% 
3.1% 
2.3% 
0.8% 
0.8% 
3.1% 
6.3% 
6.3% 
3.1% 
3.9% 
4.7% 
3.1% 
3.1% 
3.1% 
3.1% 
3.1% 
3.1% 
3.1% 
6.3% 
6.3% 
1.6% 
6.3% 
6.3% 
4.7% 
3.1% 
12.5% 





Estimating Gate Capacities of Programmable Logic Devices 





Table 8. PAL Utilization 











Device 

A1010 Terms 

Varlables 1 2 3 4 5 6 7 8 
1-4 0.34% 0.68% 1.02% 1.36% 1.69% 2.03% 2.37% 2.71% 
5-8 0.68% 1.36% 2.03% 2.71% 3.39% 4.07% 4.75% 5.42% 
9-12 1.36% 2.71% 4.07% 5.42% 6.78% 8.14% 9.49% 10.85% 
Device 

A1020 Terms 

Varlables 1 2 3 4 5 6 7 8 
1-4 0.18% 0.37% 0.55% 0.73% 0.92% 1.10% 1.28% 1.47% 
5-8 0.37% 0.73% 1.10% 1.47% 1.83% 2.20% 2.56% 2.93% 
9-12 0.73% 1.47% 2.20% 2.93% 3.66% 4.40% 5.13% 5.86% 
Device 

XC3020 Terms 

Varlables 1 2 3 4 5 6 7 8 
1-4 1.56% 3.13% 4.69% 6.25% 7.81% 9.38% 10.94% 12.50% 
5-8 3.13% 6.25% 9.38% 12.50% 15.63% 18.75% 21.88% 25.00% 
9-12 4.69% 9.38% 14.06% 18.75% 23.44% 28.13% 32.81% 37.50% 
Оемсе 

ЕРМ5128 Тегтз 

Varlables 1 2 3 4 5 6 7 8 
1-4 0.78% 1.56% 2.34% 3.13% 3.91% 4.69% 5.47% 6.25% 
5-8 0.78% 1.56% 2.34% 3.13% 3.91% 4.69% 5.47% 6.25% 
9-12 0.78% 1.56% 2.34% 3.13% 3.91% 4.69% 5.47% 6.25% 


2-19 














100.00% 
90.00% Zu. 
> 80.00% ーー ローー А1020 
Н 70.00% ——*—— XC3020 
e 
8 60:00% ーー で ーー XC3090 
50.00% 
“n —— ЕРМ5128 
. © 
= 
8 30.00% 
È 
20.00% 8 
10.00% · 
0.00% 
Ја 8 12 16 24 32 
Width (bits) 


Graph 1. Adder Utilization 








2-20 


80.00% 
70.00% 
60.00% 
50.00% 
40.00% 


vice Capacity 


30.00% 


% of De 


20.00% 


10.00% 
0.00% 


Estimating Gate Capacities of Programmable Logic Devices 









ーーー А1010 


ーー ロー А1020 







—*— XC3020 






ーー サー XC3090 
47 ЕРМ5128 





Width (bits) 


Graph 2. Counter Utilization 








2-21 





2-22 


Percent of device capacity 


35.00% 


30.00% 


25.00% 


20.00% 


15.00% 


10.00% 


5.00% 


0.00% 





—=—— А1010 


ーー ロー А1020 


ーー ャ ーー XC3020 
= —— EPM5128 
m XC3090 


8 12 
Width (bits) 


Graph 3. Decoder Utilization 





24 





32 








Estimating Gate Capacitles of Programmable Logic Devices 








35.00% 


30.00% 


25.00% 


20.00% 


15.00% 


Percent of device capacity 


10.00% 


5.00% 


0.00% 


—*— А1020 
ーー サロ ーー ЕРМ5128 


ーー ャ ーー ХС3020 


ーー マー XC3090 


=== А1010 


ne 


8 12 16 24 32 
Width (bits) 





Graph 4. Multiplexor Utilization 





2-23 








2-24 


Predicting 


Power in ACT FPGAS 





Introduction 


Calculating power in FPGAs is similar to power calculations for 
other ASIC devices based on CMOS technology, such as gate 
arrays and standard cells. The power dissipation varies depending 
on such factors as utilization, average operating frequency, load 
conditions, etc. In contrast, most PALs and PLDs dissipate a fixed 
power consumption. 


This paper will discuss power dissipation and the concept of 
equivalent power capacitance. Equivalent power capacitance 
values for ACT™ devices will be stated, and the general approach 
to calculating power in an ACT device will be described. The 
general equation may be used if internal switching frequencies can 
be accurately determined. Since this is often difficult to do, a set of 
approximation curves based on average frequency “rules of thumb” 
are provided. The graphs provide an upper limit estimate for active 
power that is sufficient for most designs. 


General Power Equation 


P= [Iccstandby + Iccactive] * Усс + lor * Мог, * М + 
Тон * (Усс-Уон) * М (1) 


where: 


Iccstandby is the current flowing when no inputs or outputs are 
changing, 


Iccactive is the current flowing due to CMOS switching, 

Тог, Ion аге TTL sink/source currents, 

Vor, Von are TTL level output voltages, 

N equals the number of outputs driving TTL loads to Мај, and 
M equals the number of outputs driving TTL loads to Voy. 


Determining N and M is dependent on the design and the system 
ГО. Hence an accurate determination is problematical. We сап 
divide the power into two components, static and active, and 
consider those components separately. 


Static Power 


Actel FPGAs have small static power components which results in 
lower power dissipation than PALs™ or PLDs. By integrating 
multiple PALs/PLDs into one FPGA, an even greater reduction in 
board-level power dissipation can be achieved. 


The power due to standby current is typically a small component of 
the overall power. For an ACT 1 device, the standby power is 
specified as 50 mWatts worst case. Typically it is less than 20 
mWatts. 


The static power dissipated by TTL loads depends on the number 
of outputs driving high or low and the DC load current flowing. 
Again, this number is typically small. For instance, a 32-bit bus 
sinking 4 mA at 0.33 volts will generate 42 mWatts with all outputs 
driving low and 140 mWatts with all outputs driving high. The 


actual dissipation will average somewhere between as I/Os switch 
states with time. 


Active Power 


Power dissipation in CMOS devices is usually dominated by the 
active (dynamic) power dissipation. This component is frequency 
dependent and hence depends on the user's logic and the external 
inputs and outputs. Active power dissipation results from charging 
internal chip capacitances such as that associated with the 
interconnect, unprogrammed antifuses, module inputs, and 
module outputs, plus external capacitance due to PC board traces 
and load device inputs. An additional component of active power 
dissipation is the totem-pole current in CMOS transistor pairs. The 
net effect can be associated with an equivalent capacitance that can 
be combined with frequency and voltage to represent active power 
dissipation. 


Equivalent Capacitance 

The power dissipated by a CMOS circuit can be expressed by the 
equation: 

Power (u Watts) = Ceo * Усс? * f (2) 
where: 

Cea is the equivalent capacitance expressed in pF, 

Усс is the power supply in volt, 

f is the switching frequency in MHz. 


Equivalent capacitance is calculated by measuring Iccactive at a 
specified frequency and voltage for each circuit component of 
interest. Measurements have been performed on both ACT 1 and 
ACT 2 devices over a range of frequencies at a fixed Vcc. 
Equivalent capacitance is frequency independent so the results 
may be used over a wide range of operating conditions. The results 
for ACT 1 and ACT 2 devices are given below in Table 1. 


Table 1. Ceo Values for ACT FGPAs 








ACT1 ACT 2 
Modules 6.3 77 
Input Buffers 16.0 18.0 
Output Buffers 25.0 25.0 
Clock Buffer Loads 5.3 25 





Finding the active power dissipated from the complete design 
requires solving Equation 2 for each component type. This requires 
knowledge of the switching frequency of each part of the logic. The 
exact equation is a piece-wise linear summation over all 
components as shown in Equation 3. 


Power = [(m * Ceo * fm)moautes + (n * CEQ * Пари + 
(р * Ceo * fP)outpurs + (а * Ceo * Ғад юан * Усс? (3) 





© 1881 Асте! Corporation 


2-25 


rera 








where: 

n = number of modules switching at fm, 

m = number of input buffers switching at fn, 

p = number of output buffers switching at fp, 

q = number of clock loads on the global clock network, 
f = frequency of global clock. 


Since all of the modules or inputs or outputs do not switch at the 
same frequency, a weighted average can be used. For example, a 
design consisting of 100 modules switching at 10 MHz and 200 
modules switching at 5 MHz would have a weighted average 
frequency of: 


Еле = [(100 * 10) + (200 * 5У/(100 + 200) = 6.67 MHz 


Average Frequency Example 


The complication comes in determining the average switching 
frequency for all of the modules, inputs, and outputs. While some 
portions of a logic design switch at the system frequency, f, most of 
the logic switches at a reduced (or divided) frequency. Consider a 
16-bit synchronous counter with a system input clock equal to F as 
shown in Figure 1. 





Qo 


Г] mad Qis 


Figure 1. 16-Bit Synchronous Counter 


where: 

The Q0 output is switching at F/2 (or 1/21), 

The 01 output is switching at F/4 (or 1/22), 

The 015 output is switching at F/65536 (of 1/2"). 

The average frequency is: 

Е. = 1/16 * (1/21 + 122 +......... 1216) - = F/16 


Thus, the average frequency of an n bit synchronous counter 
switching at F MHz is F/n. 


Determining Average Frequency 


Determining the exact average frequency for a design requires a 
detailed understanding of the data input values to the circuit. Logic 
simulation can provide insight into average frequency, although 
simulation is limited by the percentage of real-time stimulus that 
can be applied. Fortunately, studies based on large numbers of 
ASIC designs have been made to determine rules of thumb for 
average switching frequency in logic circuits. These rules are 
Meant to represent worst case scenarios, hence their use for 
predicting the upper limits of power dissipation is generally 
acceptable. The rules are given in Tables 2a and 2b for ACT 1 and 
ACT 2 devices. Using these rules, we can develop power estimates. 


Table 2a. Rules for Determining Average Frequency 
for ACT 1 


Module Utilization = 90% 

Average Module Frequency = F/10 
1/3 of Os are Inputs 

Average Input Frequency = F/5 
2/3 of I/Os are Outputs 

Average Output Frequency = F/10 
Clock Net Loading = 45% 

Clock Net Frequency = F 


эч о ел у> WN = 





Table 2b. Rules for Determining Average Frequency 
for ACT 2 


Module Utilization = 80% of combinatorial modules 
Average Module Frequency = F/10 

1/3 of I/Os are Inputs 

Average Input Frequency = F/5 

2/3 of /Os are Outputs 

Average Output Frequency = F/10 

Clock Net 1 Loading = 40% of sequential modules 
Clock Net 1 Frequency = F 

Clock Net 2 Loading = 40% of sequential modules 
Clock Net 2 Frequency = F/2 





CORND ON AONMN н 


= 
о 





Estimated Power 


The rules in Tables 2a and 2b are applied to ACT 1 and ACT 2 
devices. The resulting power components are detailed in Tables 3a 
and 3b, the total device power is shown in Figures 2a and 2b. The 
graphs provide a simple guideline for estimating power. The tables 
may be interpolated when your application has different resource 
utilizations or frequencies. 





2-26 





Predicting Power т ACT FPGAs 





Table 3a. Power Components for ACT 1 (Watts) 
































Total Power 
F (MHz) Module Power Input Power Output Power Clock Power (watts) 
A1010 
1 0.005 0.002 0.009 0.019 0.035 
2 0.010 0.004 0.017 0.038 0.069 
5 0.025 0.009 0.043 0.096 0.173 
10 0.051 0.018 0.085 0.192 0.347 
15 0.076 0.027 0.128 0.289 0.520 
20 0.101 0.036 0.171 0.385 0.693 
25 0.126 0.046 0.213 0.481 0.866 
A1020 
1 0.009 0.002 0.010 0.035 0.057 
2 0.019 0.004 0.021 0.071 0.114 
5 0.047 0.011 0.052 0.176 0.286 
10 0.094 0.022 0.103 0.353 0.572 
15 0.141 0.033 0.155 0.529 0.858 
20 0.188 0.044 0.207 0.705 1.144 
25 0.235 0.055 0.258 0.882 1.430 
Table 3b. Power Components for ACT 2 (Watts) 
Total Power 
F (MHz) Module Power Input Power Output Power Clock Power (watts) 
A1280 
1 0.011 0.013 0.021 0.027 0.072 
2 0.023 0.025 0.042 0.054 0.144 
5 0.057 0.063 0.105 0.136 0.360 
10 0.113 0.125 0.210 0.272 0.720 
20 0.227 0.250 0.419 0.544 1.440 
30 0.340 0.375 0.629 0.815 2.159 
40 0.453 0.500 0.839 1.087 2.879 
A1240 
1 0.006 0.009 0.016 0.015 0.046 
2 0.013 0.019 0.031 0.030 0.092 
5 0.032 0.046 0.078 0.074 0.231 
10 0.064 0.093 0.156 0.149 0.461 
20 0.127 0.186 0.311 0.298 0.923 
30 0.191 0.279 0.467 0.447 1.384 
40 0.255 0.372 0.623 0.596 1.845 
A1225 
1 0.004 0.007 0.012 0.009 0.033 
2 0.008 0.015 0.025 0.019 0.066 
5 0.020 0.037 0.062 0.047 0.166 
10 0.040 0.074 0.124 0.094 0.332 
20 0.080 0.148 0.249 0.187 0.664 
30 0.120 0.222 0.373 0.281 0.996 
40 0.160 0.297 0.497 0.375 1.329 
50 0.200 0.371 0.621 0.468 1.661 








2-27 











о 
т ШЕН ЕН БН БЕН ЕНІН ЕНІНЕН 8 
















































































Figure 2а. ACT 1 Power Estimates 













































































Figure 2b. ACT 2 Power Estimates 








2-28 





Acta! | Design Tools 


Converting Multiple PLD Designs to FPGAs 











Converting Multiple 


PLD Designs to FPGAS 





Introduction 


The Field Programmable Gate Array (FPGA) is a recent 
technology development that provides system level benefits over 
conventional Programmable Logic Devices (PLD). As the PLD 
once replaced MSI devices, the FPGA nowcontinues the evolution 
towards system level integration at lower cost. The transition to 
FPGAs is not trivial as architectural differences must be 
considered. However, with the right automated tools, the task of 
converting multiple PLD designs to FPGAs is fast and easy. 


Actel’s FPGAs 


Actel has developed a low-power CMOS FPGA employing the 
PLICE™ antifuse technology". The two members of the ACT™ 1 
family are the A1010 and the A1020 offering 1200 and 2000 usable 
gates respectively. The basic element in the arrays is called a Logic 
Module (LM) which can be easily configured to implement 
standard Boolean functions. Based on extensive design analysis, it 
has been determined that an LM is equivalent to 3.22 gates”. The 
small granularity of the LM and the ACT 1 architecture allow 90% 
utilization of the device as well as easy migration to masked gate 
arrays. The Actel devices are programmable and hence incur no 
risk or NRE. 


The A1280 (8000 gate equivalent) FPGA is part of the next 
generation ACT 2 family. Compared to the A1020, the new A1280 
offers twice the user I/O and four times the number of logic 
modules. The LM itself has been enhanced for greater logic 
capacity and improved sequential element performance. Current 
estimates indicate the new logic module is equivalent to 4.8 gates 
resulting in a doubling of performance compared to ACT 1. 


Actel's ALES 1 


The Actel Logic Enhancer/Synthesizer (ALES 1) automatically 
translates PLD files and optimizes the logic equations for the Actel 
FPGA architecture. ALES 1 accepts PALASM®2 source files 
directly or generated from ABEL™, CUPL™ or LOG/IC™ 
compilers. In addition, ALES 1 can optimize Actel netlist files via 
schematics or high level compilers such as Minc's PGADesigner™. 


Logic Density 


Statistical analysis of over fifty customer PLD files with various 
simple PLD types such as 16R8, 20L8 and 22V10 have yielded an 
average size of 35 ACT 1 LMs (or 24 ACT 2 LMs) per PLD when 
synthesized with ALES 1. Eleven of these files used a generic PLD 
device type allowing the user unlimited pins and logic. (The largest 
of these generic designs used 70 LMs.) The capacity of each Actel 
FPGA can now be measured in terms of this aggregate PLD. 


Table 1 indicates the total number of LMs, user I/O and equivalent 
PLDs for each of the three Actel FPGAs. A utilization factor of 
90% was used for this calculation. The number of PLDs for the 
A1280 is based on the larger ACT 2 LM. 


Table 1. Actel FPGA Specifications 








Device # LM # 1/0 PLDs/Device 
A1010 295 57 7 
A1020 546 69 14 
A1280 1232 140 46 


Table 2 compares board level statistics for a design using 46 PLDs 
versus an equivalent design using a single A1280. Using PLDs 
would yield a printed circuit board area of approximately 280 ст?, 
compared to 36 cm? for the A1280 — a reduction of 87%. Static 
power consumption for a single CMOS PLD is typically 50mA. For 
46 devices this would represent 11.5 watts. On the other hand, the 
A1280 draws only 1.7 milliwatts static power. There are also 
substantial benefits reaped from device count reduction such as: 
system reliability, assembly simplification and reduced inventory. 


Table 2. Board Level Comparison 





Technology Devices Area Static Power 
PLD 46 280 cm? 11500 mW 
A1280 1 36 cm? 1.7 mw 





It is important to consider the particular PLD circuit when 
contemplating translation to an FPGA as there are conditions 
which will limit optimal utilization. For example, PLD designs that 
are T/O intensive may use less than half of the LMs as there is a 
fixed ratio of 1/O pins to total LMs for each FPGA. PLDs 
interfacing between two other non-PLD devices fall in this 
category. 


On the other hand, PLDs used in a pipeline structure are prime 
candidates for conversion to FPGAs as they have a better balance 
between 1/Os and gates. 


Architecture Comparison 


The simple PLDs have a common AND-OR sum of products 
structure feeding a dedicated register. For example, the 16R8 has 
eight D flip-flops, each of which has eight product terms consisting 
of 16 input/feedback signals and their complements. PLDs are 
flip-flop limited and as a result, the utilization of available 
combinatorial gates is usually poor”. 


The Actel ACT 1 family has LMs which can be configured as any 
one of a myriad of macro functions. The absence of dedicated logic 
functions results in more efficient use of resources compared to 
PLDs. The ACT 2 family has an enhanced logic module with 
increased combinatorial fan-in, optimized flip-flops and latches 
allowing logic level compression in many cases. 





© 1991 Actel Corporation 


3-1 





9С7-1 





System Performance 


System performance is an important issue when considering PLD 
design migration to FPGAs. A good measure of system speed is the 
maximum frequency at which the system will operate. The 
maximum frequency is found by determining worst case path delay 
between clock edges considering the flip-flop setup time. 


Performance for the 16R8 is in the region of 20 MHz to 50 MHz 
depending on speed grade and technology. To accommodate large 
equations, more product terms are used. As a result, system 
performance will not be affected by larger functions, but they may 
not fit in the given PLD. In this case, multiple devices will be 
required and hence reduce system speed. 


FPGA performance is dependent on the application. Furthermore, 
because of the smaller granularity LMs, large equations in FPGAs 
are implemented with increased logic levels, which naturally 
impacts system performance. For example, a 10 input product term 
in the ACT 1 family, would require three levels of logic. The 
resulting system speed of such a path connected to a flip-flop, 
considering routing and setup time, is 20 MHz’. However, unlike 
PLDs, there is no static power penalty. 


By taking advantage of the enhanced LM, the ACT 2 family can 
compress logic in many types of circuits. For instance, the path 
mentioned above can be implemented in only two modules 
yielding a system speed of 40 MHZ. 


The Actel devices are performance competitive with PLDs. Only 
the very fastest applications need be implemented in PLDs. Most 
applications can reap the benefits offered by FPGAs — lower 
power, less board space and greater design flexibility. 


Translation Process 


As mentioned earlier, PLDs have large latitude in product terms 
but only allow two logic levels. Therefore, logic equations need to 
be of this form in order to fit in a particular PLD. FPGAs on the 
other hand, favor more logic levels and thus require logic equations 
altered for their best interests. How do you modify these equations 
when converting to FPGAs? Actel uses intelligent software that 
understands the intricacies of the target technology — ALES 1. 


ALES 1 accepts the same PALASM2 source files previously used to 
define the PLDs. There is no need to re-enter the logic equations in 
a new syntax or make any changes to the file. Clock buffers, 
asynchronous set and reset and output enable functions are all 
taken care of automatically. The user may optimize for area or 
speed, specify output loading and prevent buffering on specific 
signals. 


The conversion process to Actel FPGAs begins by running ALES 1 
on each separate PLD design file. An optimized netlist file will be 


generated for each PLD. Within the schematic capture 
environment, a symbol is created to represent each of these 
optimized files. A top level schematic is then drawn using the newly 
created symbols. These symbols are interconnected along with any 
other Actel components. Finally, the netlister is invoked to merge 
the optimized PLD files with the top level schematic to produce the 
complete Actel design netlist. Layout and antifuse generation then 
follow using the automated Actel development system. 


When to Convert 


The ideal time to convert to an FPGA is when a design change is 
required to upgrade the current system. Design changes may be 
greatly simplified due to the flexible nature of the FPGA and the 
entry tools. Using ALES 1, it is a trivial task to change the device 
type in the PLD file to a generic one, and add extra logic and/or 
pins. Multiple clock or asynchronous reset pins may be needed 
which are unavailable in the chosen PLD. But these are easy to 
implement with Actel FPGAs. Additional logic can also be 
painlessly integrated in the top level schematic. 


Conversion to an FPGA is also appropriate when a cost and/or 
space savings is necessary. Regaining a competitive position in the 
market and obtaining product portability are two strong motivators 
for such a redesign. In addition to lower device count, the printed 
circuit board will inevitably be smaller and simpler with an FPGA. 


Summary 


Converting multiple PLD designs to Actel FPGAs will reduce 
system cost, size and power requirements, while enhancing 
reliability and production control without risk or NRE. ALES 1 
allows this fast and easy migration using the original PLD files and 
familiar design tools. 


References 


1.) Haines, Andrew, “Field-Programmable Gate Array with 
Non-Volatile Configuration”, Microprocessors and Microsystems 
Vol. 13, No. 5 June 1989), pp. 305-312 


2.) McCarty, Dennis, “Interpreting FPLD Gate Density Data”, 
Electronic Engineering Times, Volume XI, No 583a (1990), 
pp. 14-20 


3.) ACT Family Field Programmable Gate Array Databook, Actel 
Corporation, Sunnyvale, CA, USA (March 1991) 


ACT, ALES, and PLICE are trademarks of Actel Corporation. 


ABEL, CUPL, LOG/IC, PALASM, and PGADesigner are 
trademarks of their respective manufacturers. 





ШЭЙСУЕУ | Design Examples 


A TTL Designer’s Guide to Using FPGAs 


Designing Counters with the 
ACT 2 Architecture 


Implementing Fast Counters with .... 


ACT 2 FPGAs 


Designing Adders and Accumulators with the .... 


ACT 2 Architecture 


Increase FPGA Performance Using .... 


Module/Speed Trade-Offs 


FPGAs are Better for State Machines .... 


than PLDs 


Page Mode DRAM Controller .... 


Four-Channel DMA Controller .... 


Using FPGAs for Digital PLL Applications .... 


Page 4-1 


Page 4-5 


Page 4-11 


Page 4-19 


Page 4-25 


Page 4-31 


Page 4-37 


Page 4-39 


Page 4-43 


The FPGA Design Guide 





A TTL Designers 


Guide to Using FPGAs 





Actel FPGAs offer many advantages over traditional technologies 
such as TTL. The advantages include greater reliability and 
reduced board space and power from the ability of FPGAs to 
integrate large amounts of logic into one device. For example, a 
single A1280 FPGA holds the equivalent of 165 MSI TTL devices 
(assuming 70 gates/MSI device). That means not only a smaller 
PCB, but a simpler one since most of the design’s connections are 
made inside the FPGA by the 100% automatic place and route 
software. That beats costly PCB design and fab. 


It’s Easy to Start 


Designers who are used to using TTL components may see some of 
the advantages of using Actel FPGAs in their designs and not 
realize how easy it is to begin using them. You don’t have to know 
anything about the internal workings of the FPGA. In fact, the 
schematic entry and simulation process is the same as it was with 
TTL. 


Actel Library 


Actel provides a library with the system for popular schematic 
design tools. The library contains both hard macros and soft 
macros. Hard macros are similar to SSI components. They form 
the basic functional building blocks such as gates and flip-flops. 
Many Actel hard macros are identical in function to TTL parts 
though they have different names. The Actel databook contains a 


© 1991 Actel Corporation 


cross reference guide showing the names of hard macro 
components that match functionally to TTL. 


Soft macros are more complex functions that are built from a 
number of hard macros. They include counters, decoders, and 
adders of various sizes. Some of the soft macros in the library have 
identical functions to MSI TTL parts. These may be identified by 
their names beginning with TA instead of 74. The rest of the name 
matches that of the TTL name. Other soft macros offer generic 
logic functions. 


Creating Custom Soft Macros 


All soft macros are easily copied and modified, and there is no limit 
to the number of custom soft macros you may add to the library. 
Should you need a TTL function for which there is no near 
equivalent in the library, it is easy to build it from scratch. Simply 
copy the schematic as shown in the TTL databook using 
components such as gates and flip-flops from the Actel hard macro 
library. 


As you gain familiarity with the Actel hard macros you will find 
instances where you can do a more efficient design than that found 
in the TTL databook. For example, if the book shows an AND gate 
driving an OR gate you may find a single hard macro containing 
both the AND and OR functions. Using such multifunction 
macros is very efficient because you get more and faster logic from 
the macro. Compare the 74AS161 counter schematic from a TTL 
databook with the TA161 from the Actel library in Figure 1. 


EN 





ad 


DO 


во 


vo 


Jeiuno つ 1919171 eunBig 





INA 
ама 











A TTL Designer's биде to Using FPGAs 


ц[———-—-——————Є—Є—Є—}=-—-—-_-_-_-_-_-_-—ЄЄЄ— 


Three State Design 


Many TTL parts have three-state outputs allowing them to be 
connected to a common bus. Three state functions don’t work well 
with ASICs or FPGAs because they tend to be slow and inefficient. 


You don’t have to give up using buses in your designs. Simply 
implement them using multiplexors as shown in Figure 2. 
Multiplexors are very efficient on Actel parts. For example, you can 
create an eight-bit bus with four possible drivers using less than 3% 
of an A1010, Actel’s smallest part. 








А [0] 
B [0] 
С [0] 
D [0] 


Actel FPGA Implementation 


Figure 2. Least Significant Bit of a Bus with Four Possible Drivers 





A [0] 
C [0] 
B [0] 
D [0] 
Discrete Technology Implementation 
Design Tips 


If you use a soft macro, but don't need all the outputs, you don't 
need to modify the macro. The Actel software contains a program 
which will automatically eliminate any unused modules before the 
design is placed and routed. 


If you use a soft macro or a hard macro that has inputs you don't 
need, the situation is different. The software won't allow inputs to 
be left unconnected, so some designers simply tie unused inputs to 


Vcc or GND. That is permitted, but a better solution would be to 
select a hard macro that only has inputs you need or modify the soft 
macro to eliminate the unused function. 


For example, the TA161 counter has a load function and four data 
inputs. Rather than tying off those pins, a better solution would be 
to make a copy of the counter and modify it as shown in Figure 3. 
That will allow the wiring resources on the chip to be used for 
things other than power and ground connections. 








NEW161 





Figure 3. Library Symbol and Modified Symbol 








4-3 





4-4 


Designing Counters with 


the ACT 2 Architecture 





Perhaps the most common digital logic function used is the 
synchronous binary counter. Regardless of the technology 
employed to implement counters, they are found in every type of 
application. Designers who are familiar with counter design using 
discrete logic are used to being able to predict the performance of 
the counter by reading a datasheet. When using FPGAs there are 
more things to consider. The following describes some of the 
considerations you should be aware of when designing a counter 
yourself, or modifying a counter in the ACTTM 2 soft macro library. 


Performance Factors 


The advantages of using FPGAs over discrete devices are well 
known. In order to maximize the benefits of high integration and 
low power, it is important for the designer to understand how to 
best implement the counter. The performance of the counter is 
variable, so it is important to do the optimal design. There are four 
criteria that influence performance of logic in FPGA designs. They 
are listed below in descending order of importance: 


• Levels of Logic — The fewer the number of combinatorial 
logic levels between flip-flops, the faster the counter 
frequency. 


e Fan-Out — Propagation delays in FPGAs are sensitive to 
fan-out. Limiting fan-out on individual nets improves 
performance. 


• Fan-In — Measures the number of nets connected to a logic 
module’s inputs. Library functions with heavy fan-in 
efficiently utilize the logic of the module and aren't a 
problem when used sparingly. Too many high fan-in macros, 
though, can congest routing and reduce performance. 


@ Number of Modules — Fewer logic modules allow them to 
be placed closer to each other. Shorter distances between 
modules speed the connection paths. 


While each of the above considerations is important in itself, it 
must be remembered that they are interrelated and an 
improvement in one may cause a degradation in another. For 
example, limiting the number of logic levels tends to increase 
fan-in. A balance must be found among them so than none 
becomes a drag on performance. 


Before proceeding to a detailed description of a sample design it 
would be useful to review some fundamentals of the ACT 2 
architecture. In the design of a counter, or any soft macro, device 
architecture should always be considered to obtain the best results. 


Two Types of Modules 


The ACT 2 architecture features two types of modules. 
Combinatorial modules (C-mods) are used to implement any 
combinatorial function in the ACT 2 library. Sequential modules 
(S-mods) can be used for either sequential functions (e.g. flip-flops) 
or combinatorial functions or both. When the S-mod is used to 
implement both a sequential and a combinatorial function (e.g. a 


gate followed by a flip-flop) it is being used in the most efficient 
way. 


The module types exist in roughly equal numbers on ACT 2 devices 
and the Place and Route software will automatically select the 
appropriate module for each library component in the schematic. It 
is up to the designer to understand how to select components from 
the library so as to take best advantage of the logic in the modules. 


As the sample design will show, you can construct a 10-bit counter 
with only one level of combinatorial logic between flip-flops and a 
16-bit counter with only two combinatorial levels. If some counter 
outputs may be active-low or if additional modules are used for 
redundant logic (e.g. some bits have both an active-high and an 
active-low output), then larger counters may be designed without 
additional logic levels using five-input gates. Such decisions should 
consider the implications on module count and fan-out as detailed 
below. 


Sample Design 


The sample design for a 16-bit synchronous loadable binary 
counter with a count enable will serve to illustrate some of the 
considerations designers should be aware of when using counters in 
ACT 2 FPGA designs. The functional description for the counter 
appears in Table 1. 


Table 1. Counter Function 








RST LD CE CLK о 
0 X X X 0 
1 1 X E D 
1 0 1 x Q 
1 0 0 24- Q+1 





Using the Modules 


The counter design makes extensive use (bits 0 through 5) of a 4:1 
multiplexor driving a flip-flop as depicted in Figure 1. Both the 
multiplexor and the flip-flop will be combined into a single S-mod 
by the ALS software. The select lines on the multiplexor are 
operated by the load control (S1) and by the count enable and carry 
from the lower order bits (50). The multiplexor data inputs are used 
for data to be loaded, held, or incremented. 


C-mods are used as AND-EXORs and for ANDs to qualify the 
count function. The AND function is also used to bring the count 
enable (CE) to the multiplexor ма the select line in bits Q3 and 
above. Using both the select inputs as well as the data imputs to 
AND lower order bits allows for more paralleling resulting in fewer 
levels of logic. 


For the bits Q6 through Q15, an S-mod macro with both a 4:1 
multiplexor and an OR gate driving one select line is used to allow 





O 1991 Actel Corporation 


4-5 


(¡El 








for more parallel propagation of the lower bits. Figure 2 shows the 
implementation of the most significant bit of the counter. 


The S-mod OR gate is used as a two-input NAND with active-low 
inputs which are, in turn, driven by NAND gates to propagate the 
lower bits and the count enable. The active-low output of the 
built-in two-input gate (OR used as a NAND) is adjusted for by 
shifting the position of the multiplexor data inputs. 


Most four-input gates are implemented with a single C-mod, but a 
four-input NAND with no bubbled inputs requires two modules. 
The limitation is avoided by using a NAND with a bubbled input. 
The count enable is active low, so it may be used to drive a bubbled 
input on a gate. Active-low counter bits are used to drive other 
bubbled gate inputs. 




















Figure 1. Counter Bit Q5 
аза A 
B 
а1з Y 
aı2 с 
а11 D 
А 
mi 
A с ) > c а15 
о1о 
55 в Se eis DIS 
cio) 
в 
а D 
о? 
CLK 
QON Эн 
B RST 
oe 
оз 5 
са 
CE 
Ч 
в 
оз Y 
o2 c P 
el D 


Figure 2. Counter Bit Q15 











Levels of Logic 


As may be seen in the complete counter schematic (Figure 3), two 
bits (Q0 and Q6) use inverters. The QO inversion toggles the 
flip-flop. A toggle flip-flop could have been used instead of a D 
flip-flop, but it could not have been combined with the multiplexor 
into a single S-mod. Moreover, the inverter output is available as a 
resource to share the fan-out load with the flip-flop and to allow the 
use of bubbled inputs on gates whenever it is desirable. 


It could be argued that the use of the inverted output to drive gates 
causes the lower level bits to use two levels of combinatorial logic 
when it is not necessary. For a design of ten bits or less, the point 
would be valid because no path requires more than one 
combinatorial level. In the example design, however, two levels are 
already required by the upper bits and the improvement in fan-out 
from the use of the inverter output at no additional cost in module 
count makes the practice worthwhile. 


Limiting Fan-out with Redundant Logic 


The paths in the design that are most likely to limit performance 
are those with the largest number of logic levels and the highest 
fan-out. In general, when fan-outs exceed 9 on acritical path, using 
redundant logic is often clearly called for. For lower fan-outs the 
decision to use redundant logic is problematical and must be 


Designing Counters with the ACT 2 Architecture 


balanced by considering both the cost in additional logic modules 
as well as the fan-out to the outputs driving the redundant logic. 


In the sample design, two redundant modules were added to 
illustrate the concept. One is an ХМОЕ gate whose output is the 
inversion of Q1. The other is a four-input NAND gate which 
propagates flip-flop outputs. No fan-out in the design exceeds 
seven and the worst-case path is the redundant gate whose inputs 
are driven pins with fan-outs of seven, six, six, and five; and whose 
output fan-out is four. A total of 48 modules were used in the 
design. 


Chip-Level Design Considerations 


The trade-offs involved in soft macro designs such as the counter 
example cannot be evaluated outside the context of the overall 
design. Decisions such as whether to use redundant logic or high 
fan-in modules must consider the entire design. For example, if all 
the modules are already being used, redundant logic may not be an 
option. If the design has a significant number of high fan-in macros, 
use of additional high fan-in macros in a counter may cause routing 
congestion. 


The ALS tools such as the Validator and automatic Place and 
Route can rapidly provide answers to questions about design 
capacity and routability. 





GIS! 





90 


so 


so 


x [1 
ы 











ттэ 


oo 


те 


эо 
со 
тмо 


co 
Ind 
OND 


eo 


90 
TNO 
ONO 


vo 


60 
90 
мо 





JE 





а 
a h 
vo 
Ра 
so 
(= 
ч 
h 
so 
> 
x a 
per 


x12 


омо 


zo 
зэ 


vo 
тэ 
ома 


co 
zo 
з» 








4-8 


Designing Counters with the ACT 2 Architecture 








м 
4 
o 


ris 
eis 






< 
» 


«jajoja 


^ 
a 
~ 


о 2 о ‘ o 4 
[4 к [4 [4 
È «Sb "i i «fb È кр 
onna nto HU oawn ato 09 Qana ni 9 u 











о14 
оз 
ата 
D ara яа 
pi i 
| 
| ュー 3 
сіз 
912 
өзі 
aıo 
Figure 3. 16-Bit Counter Schematic (continued) 


x 
бо Å 
| | 
| | 


> 


|. 
| шиш!!! 





to 
crx 


onz s4 ISLA 








н 
БЫ 
Н 
a 
2 А 
å е 
Å è | day: 
~ v » "o 9 
(i) 5 18 
И alo] a 9 
ч 
^ 
o 
Uumen 
ообо 
ж 
В E 
8 
«Га 
4 «ја[оја 5 
o 
.. 
за oo 











Implementing Fast 


Counters with ACT 2 FPGAS 





Introduction 


Counters are one of the most important functions designed into 
FPGAs and as such they have recently been used as a benchmark to 
compare one technology or product to another. As shown in a 
previous technical brief!, the performance of a particular FPGA 
implementation is a function of the amount of logic used. Faster 
implementations use more logic and slower implementations use 
less. This paper will show a technique for making the highest 
possible performance counters in the ACT™ 2 family using a 
toggle prediction method to enhance performance. 


An important realization in designing high performance counters 
is the fact that the least significant bits (LSBs) of the counter 
change the most frequently, and higher order bits change much less 
often. This fact can be used to optimize counter performance by 
making sure the least significant bits enter the logic trees at the 
lowest (fastest) level. Higher order bits can enter further up the tree 
since they have a longer time to propagate through the logic. Let’s 
consider the design of a 6-bit counter using this technique. The 
counter will be loadable with asynchronous clear. It will also be a 
down counter, suitable for timing or address generation in a typical 
digital design. An up counter is an easy modification to the design, 
but conceptually they are similar. 


Least Significant Bit 


The counter needs to have a least significant bit which can toggle at 
the highest possible rate. In the ACT 2 family the sequential logic 
module allows a 4-input multiplexor with gated select lines and a 
D-type flip-flop to be implemented in a single level of logic (see 
Figure 1). This logic module can be used to construct a least 


significant bit with clear, load, and count enable as shown in Figure 
2A. 


Data can be loaded into the register when the load enable (LD) 
signal is high, selecting the D2 or D3 input on the multiplexor. The 
count enable signal (CNT) is used to toggle Q0 when counting is 
enabled. If LD is disabled, the multiplexor input is either DO or D1 
depending on the state of Q0. If Q0 is a 0 and CNT is a 1, the 
register will be loaded with a 1 from the DO input. If Q0 is a 1 and 
CNT isa 1, the register will be loaded with a0 (NOT CNT) from the 
D1 input. Thus Q0 toggles if CNT is a 1. If CNT is a 0, QO will hold, 
not toggle since the DO/D1 inputs will now be 0 and 1 respectively. 
This gives us a least significant bit which needs only a single level of 
logic to operate. Since the next most significant bit will only toggle 
when Q0 is a 0 (remember we are implementing a down counter), 
we will have 1 extra clock cycle to develop the signals for Q1’s next 
State. 


Figure 2B shows the implementation of Q1 using the ACT 2 
module. It is similar to the LSB except Q1 can be inverted to 
develop the toggle signal. Notice that this signal is selected only 
when O0 and /CNT are low. This keeps the slower /Q1 signal from 
participating in the logic function until it has settled, insuring fully 
synchronous operation. This technique will be the cornerstone of 
the most significant bits (MSBs) where slower signals will be gated 
off until they have had sufficient time to settle and they are needed 
to compute the toggle of the associated counter bit. 


Figure 2C shows the entire 2-bit prescaler (CNT2P) for fast 
counters. Added to Q0 and Q1 are 3 registers to source CNT, /CNT 
and LD. These will be used to reduce fan-out in the fastest counter 
implementation. 








Figure 1. ACT 2 Family Sequential Logic Module 








© 1991 Actel Corporation 


озер 421М2 ‘07 этби 








ле EN 


LO ug 19juno つ “Яг ƏN 











0D на 191uNn09 ‘ус 9614 








4-12 





Most Significant Bit 


A 4-bit macro for the MSBs is given in Figure 3. It uses the LSB 
from the CNT2P macro connected to СТО (on SOA) and CNT (on 
SOB) to enable the multiplexor inputs used for toggling similar to 
the CNT2P macro. In addition, the next LSB is connected to СТ! 
(on Bof AXB1) and the counter bits in the macro (Q0-Q2) are used 
to gate the input to the XOR (see Q3 for example) which 
determines whether the register bit holds or inverts. A ripple carry 
input is also used to allow results from previous stages to participate 
(RCI). This technique allows MSBs at least 4 clock cycles (since the 


Implementing Fast Counters with ACT 2 FPGAs 


LSBs’ transition from 00 to 00 in 4 clock cycles worst case) to 
develop the input to the associated bits in the counter. A 6-bit 
counter is shown in Figure 4. 


The limiting frequency for the 6-bit counter is based on the longest 
clock-to-clock delay in the design. There are four basic paths in the 
design, the maximum internal clock frequency of the device, a 
single level path from Q0 to О0-5, а 2 level path from Q1 to Q1-5, 
and а 3 level path from Q2-5 to Q2-5. Estimates for each path are 
given below for an A1280-1 using datasheet numbers and a 1.2 
derating factor. 








Path Datasheet Parameter(s) Value Requlrement 
Max. Clock 2*tW*1.2 

2*7.5ns*1.2 = 18 ns Must be < = 1 Clock Cycle 
Q0 Delay ИРО(РО =6) + tSU]*1.2 

[9.25ns + 1ns]*1.2 = 12.3 ns Must be < = 1 Clock Cycle 
Q1 Delay [tPD(6) + tPD(1) + tSU]*1.2 

[9.25ns + 5.5ns + 1ns]*1.2 = 18.9 ns Must ba < = 2 Clock Cycles 
Q2-5 Delay ЇЇРР(6) + tPD(1) + tPD(1) + tSUJ*1.2 

[9.25ns + 5.5ns + 5.5ns + 1ns]*1.2 = 25.5 ns Must be < = 4 Clock Cycles 


Thus: Maximum frequency is limited by the internal clock rate of 55 MHz. 


An 18-bit counter implemented with these macros is given in 
Figure 5. The limiting frequency for the counter is determined 


similarly to the 6-bit example and the path delays are given below 
for an A1280-1. 


Path Datasheet Parameter(s) Value Requirement 
Clock 2*tW*1.2 

2*7.5ns*1.2 = 18 ns Must be < = 1 Clock Cycle 
ад Delay [tPD(18) + tSU]*1.2 

[18ns + 1ns]*1.2 = 22.8 ns Must бе < = 1 Clock Cycle 
Q1 Delay [tPD(18) + tPD(1) + tSU]*1.2 

[18ns + 5.5ns + 1ns]*1.2 = 29.4 ns Must be < = 2 Clock Cycles 
Q2-5 Delay [tPD(6) + tPD(5) + tPD(5) + tPD(5) + tPD(1) + tSU]*1.2 

[9.25ns + (8.7ns)*3 + 5.5ns + 1ns]*1.2 = 50 ns Must be < = 4 Clock Cycles 





Thus: Maximum frequency is limited by the 00 delay of 44 MHz. 


A fan-out limited version of the design is given in Figure 6. The 
2-bit prescale is replicated to keep fan-out at 6 or less. The 
frequency calculation is now similar to the 6-bit counter with a Q2-5 
Delay like the previous 18-bit counter. The resulting frequency is 
given for the ACT 2 family of devices below. 


A1280-1 A1240-1 A1225-1 
Max. Clock Delay 18 ns 13 ns 12 ns 
00 Delay 12.3 ns 11.07 ns 11.07 ns 
Q1 Delay 18.9 ns 17 ns 17 ns 
Q2-5 Delay 25.5 ns 23 ns 23 ns 
Max. Frequency 55 MHz 75 MHz 85 MHz 


Conclusion 


A technique for creating very fast counters in the ACT 2 family was 
given showing operating frequencies as high as 85 MHz. These 
techniques will allow FPGAs to be used in new applications which 
will further increase the popularity of these key devices. 


1.) “Increase FPGA Performance Using Module/Speed 
Trade-Offs”, The FPGA Design Guide, Actel Corporation, 
Sunnyvale, CA (August 1991) 








4-13 


yera 








0126N 13JUNO) 1 日 "с 9Jn5i』 











4-14 








Implementing Fast Counters with ACT 2 FPGAs 











Figure 4. 6-Bit Counter 











зошпод Ya-81 '5 əng 


¿TO 


910 
sto 
PIO 


поо 


амо 


MID 


та 





ачот 


to:zrt1o0 (0:4114а 


EN 











4-16 


Implementing Fast Counters with ACT 2 FPGAs 








ибїваа рээпрэн 1nO-ue4 чим зајџпо5 YE-81 `9 эт 


амо 


| 


[0:2т]О [0:2тја 





4-17 





4-18 


Designing Adders and Accumulators 


with the ACT 2 Architecture 





Introduction 


Many designers are used to implementing adders using 
carry-propagation techniques. The multiplexor-based ACT™ 2 
combinatorial module (C-mod) allows for the more efficient carry- 
select design. This method partitions the add function into blocks 
that perform two additions simultaneously on a number of bits of 
the two operands. 


The two additions are the same except that one assumes a carry-in 
and one has no carry-in. The two sums are input to 2:1 multiplexors, 
one for each bit pair. The carry line, from the low bits to the high 
bits, is used to select the appropriate sum for each block. 


The ACT 2 architecture lends itself well to implementing adders of 
various sizes using the carry-select technique. A sample design fora 
16-bit adder, as shown in Figure 1, will be used to illustrate adder 
design. 


Balancing Sum and Carry Levels 


The method for obtaining optimal performance from а carry-select 
adder is to design it such that the number of levels of logic required 
for the carry chain equals the number for the largest sum block as 
closely as possible. When they have the same number of levels, the 
sum bits arrive at the data pins and the carry arrive at the select pins 
of the output multiplexing stage simultaneously. 


© 1991 Actel Corporation 


The way to balance the levels of logic modules for the sum blocks 
with the carry is to partition the sum blocks considering the logic 
levels required for the sums and the levels for the carry between 
sums. The size of the partitions varies with width of the data. The 
ACT 2 library contains some powerful hard macros that are used to 
shorten the levels of logic required for generating sums and carries. 
The description of the sample design will illustrate the use of the 
macros. 


Sample Design 


For the 16-bit adder, the optimal organization is to perform two 
two-bit additions on the four least significant bits with the 
remaining higher order bits broken into four sections of three bits 
each. 


In the top-level schematic the addition logic of the two least 
significant bits is visible. The other additions are performed in 
lower levels of the design hierarchy described in the next section. 


Carry Logic 


The ACT 2 library includes two two-level carry hard macros. One 
macro generates a carry for the two bit pairs assuming the carry-in 
is true and the other assumes it is false. The latter macro may be 
seen at the bottom of Figure 1 making the carry for the two least 
significant bits. 


The carry macro output drives the select line for the 2:1 
multiplexors for sum bits two and three. It also drives the select line 
on the cascade multiplexor. The cascade multiplexor is a special 
ACT 2 hard macro that can propagate two levels of carry. The 
macro is depicted in Figure 2 and has five inputs. The top 
multiplexor inputs select the most significant sum or carry. The 
three lower inputs drive logic that implements a simplified form of 
a 2:1 multiplexor. 





Figure 2. Cascade Multiplexor Macro 





A fully implemented 2:1 cascade multiplexor does not map into the 
ACT 2 module efficiently, but the full functionality is not required 
in a carry-select adder. A simplified version of the cascade 
multiplexor that maps into a C-mod or that can be combined witha 
flip-flop in a sequential module (S-mod) is available. 





Designing Adders and Accumulators with the ACT 2 Architecture 


The simple version has logic driving the select for the upper level 
multiplexor consisting of only a two-input OR driving one input of 
a two-input AND. The two OR gate inputs are driven by the carry 
output from the next lower sum block assuming no carry-in and the 
carry-in from the rest of the lower bits of the adder. The remaining 
AND input is the carry from the sum block which assumes a 
carry-in. 

The logic is correct for a carry-select adder because if the 
assume-no-carry-in input is true (meaning that a carry was 
generated within that sum block), then the assume-carry-in is 
always true (Since it equals the false plus one) which completes the 
AND function. 


If the carry Кот the lower bits is true (meaning a carry is 
propagated to the sum block), then we complete the AND if the 
assume-carry is true. 


Three-Bit Carry Select Adder 


The schematic for the three-bit adder block appears in Figure 3. 
The adder requires thirteen logic modules to generate the three 
sum and carry pairs. All the output paths are two levels of logic or 
less. 


The two carries for the three bits come from two-level carry hard 
macros driving a three-bit majority macro. 


All the sums are generated from exclusive OR or NOR gates. The 
Actel library contains several two-module adder macros with both 
asum and a carry output. These macros may be used as part of a 
sum depending on how well they fit into the overall adder soft 
macro structure. 


хоо Јерру ұя-әәлі “с олпб)4 





га 


ew 








4-22 





Making an Аддег into ап Accumulator 


All the sum and carry outputs of the adder macro are combinable 
into a single ACT 2 S-mod. This combinability feature means that 
if the data inputs of a 16-bit register are schematically connected to 
an adder’s outputs, the ALS software will automatically put the 
adder output macros (2:1 multiplexors or cascade multiplexors) into 
their respective flip-flop in the register. 


Designing Adders and Accumulators with the ACT 2 Architecture 


The registered-output adder will suffer no degradation in 
performance from the combining because the delay through the 
combinatorial part of the S-mod is less than that of an uncombined 
macro. Tying the register output back into the inputs will make the 
circuit into an accumulator. A sample design for an accumulator 
made from an adder and a register may be seen in Figure 4. 











FADD16 


$ [15:0] 


А [15 : 0] 
В [15 : 0] 










Q [15 : 0] 
D [15:0] 







Figure 4. 16-Bit Accumulator 





Sample Design Results 


The sample design uses 82 ACT 2 modules. The slowest path in a 
function is usually the one with the most levels of logic. In this case 
it is the carry chain which has four levels of logic. As mentioned 
previously, all other paths have fan-outs of three or less. 


The modules in the chain have fan-outs of three, four, seven, and 
four. Criticality may be used to optimize the path performance. 
Criticality works best when fan-out is low. When the fan-out of a 
speed-sensitive net exceeds seven, performance can usually be 
most improved by adding redundant logic. For fan-outs of less than 
seven, adding a redundant module may bring no improvement. 
Using redundant logic for fan-outs of seven should be considered 
on an individual basis. Adding a redundant module to the carry 
path would change its fan-outs to three, five, three, and four. The 
expense of one module may be justified by the performance 
improvement from lowering the fan-out. 


It is also possible to improve performance by pipelining an adder. 
Since all of the combinatorial functions used in the adder are 
combinable (if the function's output drives a flip-flop, ALS will put 
both in a single S-mod), designers may pipe the adder at the points 
that provide the best performance at no cost in additional modules. 


Other Adder Macros 


The carry select architecture is extensible to adders of any size. 
Adders of eight to fifteen bits may be designed using the technique 
in three levels of logic. Adders from 16 to 24 bits can be done in four 
levels. 


When adapting the adder design to other operand sizes, remember 
to repartition the sum block sizes to match the logic levels of sums 
and carries. 








4-24 


Ly) 


Increase FPGA Performance 


Using Module/Speed Trade-Offs 





Traditionally designers using TTL or PLDs had few alternatives 
when implementing a particular logic function. If faster speed was 
desired, a different technology or faster speed grade could be 
considered, but few other alternatives existed. FPGAs offer the 
designer a wide range of logic implementation choices because of 
the flexibility of the underlying logic and interconnect structure. 
This paper will explore one important aspect of this added 
flexibility — the ability to adjust the performance of a logic 
implementation by using larger or smaller amounts of logic. 


An 8-bit accumulator is a common building block in most digital 
systems, and as such it is a good example for exploring FPGA 
module/speed trade-offs. A simple design for the accumulator is 
shown in Figure 1. This design uses the simple sum and carry 
approach to implement the adder. A single level majority gate 
(MAJ3) is used to develop the carry based on the A, B, and 
CARRY-IN signals from each set of bits. This is the smallest (in 
module count) design possible using a parallel approach (serial 
approaches would use even smaller module counts, but we will limit 
the designs to only parallel approaches for the purpose of this 
analysis). It requires 24 logic modules to implement the design in 
the ACT™ 2 library. The worst case delay through the design is 
from data-in to carry-out and takes 8 logic levels. 


An alternative design can be implemented using a carry prediction 
scheme where sum and carry signals are developed assuming the 
carry is either a 1 or a 0. The actual value of the carry is then used 
downstream to determine the actual sum or carry. Conceptually 
this is like computing 2 results and selecting the correct one at the 
last minute (or nanosecond in this case). This design is shown in 
Figure 2. It uses 40 logic modules and has 3 levels of delay worst 
case. 


To better understand this design, notice the method used to 
develop C4, the CARRY for S4. C4 will be either C4_1ifC2 isa lor 
C4 0 if C2 is a 0. (C4 1/0 are developed from A2/B2/A3/B3 by 
single module macro CY2A/B. CY2A assumes C2 is 0 and CY2B 
assumes C2 is 1, indicated by the 0/1 displayed in the middle of the 
symbols.) C2 is developed from C2 0 and C2 1 depending on the 
value of CIN. The single module macro С$1 is used to evaluate the 
resulting C4 output in a single logic delay. 


The higher order sums are developed in a similar manner with the 
sum being computed assuming CAR RY-IN is 0 (using an XOR)or 
a 1 (using an XNOR). S7 for example, the most complex sum, is 
computed by developing the 2 possible sums assuming a value for 
C4 and selecting between them using C4. Notice the sums are 
computed by developing a CARRY 7 from C6 0/1, A6 and B6 
using a carry generate and propagate technique. The entire 
function requires only 3 logic levels, but uses 40 logic modules, 
almost twice as many modules as the ripple adder in Figure 1. 


Modifications of the carry select adder are shown in Figures 3 and 
4. They require 34 and 32 modules and 4 or 5 levels of logic 


respectively. In Figure 3, 6 logic modules are saved by adding C2 
and C6 outputs from the carry generation logic and simplifying the 
sums for 52, S3, S5, S6, and S7. In Figure 4, S7 was simplified further 
using an extra level of logic. Further modifications to the logic 
would reduce module count even further, by using more logic 
levels. 


These 4 examples show that there is a variety of possible solutions 
to a particular logic design problem. In general the solution space 
will be similar to the graph in Figure 5 where the logic levels 
required can be low (and thus high speed) if more modules are 
used, or high (and thus slow speed) if fewer modules are used. 
Other types of designs which can benefit from this module/speed 
trade-off attribute of FPGAs are counters, random logic, 
multipliers, and state machines. 


Random logic and multipliers can also use the prediction method 
of function implementation by paralleling logic assuming the value 
of the critical signal and then using the actual value to select 
between the two possible results. Additional techniques for 
increasing performance of these types of functions, but which 
require more logic, are pipelining and paralleling logic to reduce 
fan-out. 


State machines can use bit-per-state techniques to increase 
performance’. In some cases additional modules may be necessary 
(for example, some very complex state machines using more than 
20 states), but in many cases module decreases are possible. 
Pipelining and fan-out reducing techniques can also apply to state 
machines. 


Counters can also utilize the prediction technique by using the least 
significant bits to select between the two possible counting results, 
toggle and hold?. This technique requires more logic modules than 
traditional techniques, but can increase performance dramatically 
since the least significant bits require only a single logic level in all 
counter bit positions. The most significant bits can use more logic 
levels since they change very infrequently. 


Conclusion 


Speed is not a constant when designing with FPGAs. It can be 
adjusted to a large degree depending on the number of logic 
modules used to implement the function. Faster implementations 
can be designed using more logic modules, slower ones using less. 


1) "FPGAs are Better for State Machines than PLDs", The 
FPGA Design Guide, Actel Corporation, Sunnyvale, CA 
(August 1991) 


2) "Designing Counters with the ACT 2 Architecture", The 
FPGA Design Guide, Actel Corporation, Sunnyvale, CA 
(August 1991) 





© 1991 Actel Corporation 


4-25 











51 


52 


COUT 








8 Logic Levels 
24 Modules 


Figure 1. 8-Bit Ripple Adder 





Increase FPGA Performance Using Module/Speed Trade-Offs 





зөрру UOND|paid ん Je つ упа ug-8 гс эмбы 





Se|npow Ov 
зјелет 91607 € 


IE o と つ 


NID 








4-27 





-le eN 





зөррү 399/95 Aue їеүнед на-8 “6 917614 


зөрроү| te 
5әләл 91807 у 


155 





oTınos 


т anos 





4-28 


Increase FPGA Performance Using Module/Speed Trade-Offs 





зөррү 199195 AJJe つ jerued ug-8 “y Әб 


SeInDO гє 
Sje^e] 216016 





oTinos 


て ー ェ no っ 





4-29 











Number of 
Modules 
Used 


40 


10 


5 


0 


Modules Used versus Logic Levels Needed 


1 2 3 4 5 6 7 8 9 10 


Number of Logic Levels 


Figure 5. 8-Bit Adder 

















FPGAs are Better for 


State Machines than PLDs 





The traditional methodology for designing state machines has 
been to draw a state diagram, map the states into the minimum 
number of register bits needed to encode the state, and determine 
the next state function from a Karnaugh map or equivalent 
technique. This results in a minimum number of registers but 
usually requires wide gating to determine the next state bit. PLDs 
are register-lean but can do wide gating, so their architecture fits 
into this methodology fairly easily. It is no wonder that designers 
see PLDs as being best for state machines. 


FPGAs, however, offer designers a different set of resources. 
Registers are abundant and gating is optimized for more narrow 
functions. A state machine designed to take advantage of the 
abundance of registers available on an FPGA is more efficient than 
a wide logic oriented implementation. 


State Machine Design Example 


Let's take an example state machine illustrated in Figure 1. It is the 
control section of a 4-channel DMA controller super-macro for 
Actel FPGAs and contains 6 states (State0, .. . , StateS), 7 inputs (A, 
В, С, Р, PBGNT, MACK, CONT), and 5 outputs (PBREQ, 
CNTLD, CMREO, CE, CLD). The state transition equations for 
the state machine have been computed and are listed in Table 1 
using PLD style logic equations, 


Table 1. State Transition Equatlons for Example State 
Machine 





State0 := /A*/B*/C*/D*State0 + /CONT*State4; 
State1 := (A+B+C+D)*State0 + /PBGNT*State1; 
State2 := PBGNT*State1 + /MACK*State2; 

State3 := MACK*State2 + MACK*State3; 

State4 := State3; 


State5 := CONT*State4 + /MACK*State5; 


The traditional approach would now map these 6 states into 3 state 
bits by making state assignments. If the assignments are made as 
indicated in Table 2, the logic in Figure 2 results. 


Table 2. State Assignment Used In Figure 1 Logic 
Design 


Stated = /Q2*/Q1*/Q0 
State2 = /Q2* Q1* С0 
State4 = Q2*/Q1* 00 





State1 = /Q2*/Q1* 00 
State3 = 02% 01% 00 
States = Q2*/Q1*/Q0 





The logic required to implement the state machine as illustrated in 
Figure 2 consumes 25 ACT™ 1 modules (note: a register costs 2 


modules and the AND4A macros cost 2 modules each). The 
longest path requires 6 levels of delay. 


In many cases state assignment may be time consuming since state 
assignment requires new transition terms to be determined after 
assigning state variables. Output logic also needs to be determined. 
Several iterations of state assignments may be required to get an 
optimal design. 


When registers are not expensive state assignment is trivial if a 
register is used for each state bit. In that case, the design will 
directly implement the transition terms given in Table 1. The logic 
given in Figure 3 is the result of this Bit-Per-State (BPS) logic 
design. It uses 19 ACT 1 modules and only requires 4 levels of delay. 
This is a 25% logic savings, and a 40% speed improvement over the 
Original register-lean design in Figure 2. 


Looking closely at the design you may notice that De-Morgans 
Theorem has been used on the logic to optimize the design. The 
State bits are inverted to make the states active-low, and the single 
module) OA2A and OA2 macros are used to create the transition 
terms. (In the ACT 1 library these macros will allow 2 transition 
terms of 2 input each to be mapped into a single module, but only 
one input can be active-high. Making states active-low allows the 
transition terms with active-high inputs to map into the OA2A 
macro, and transition terms with no active-high inputs to map into 
the OA2 macro. Thus, ACT 1 designs will usually be most efficient 
using active-low state bits for small state machines using the BPS 
design technique.) The reset signal sets all states except State0 
which is cleared since it is the initial state for the machine. 


Active-High Implementation 


If active-high state bits are best suited for a particular state 
machine, they can also be used by clearing them on initialization 
and detecting the active-high state for transition purposes. This can 
also be done when output signals need to be active-high and they 
are a direct output from the state bit. For example, if the control 
signal CE needed to be active-high the logic in Figure 4 would be 
used. Notice the 3 changes. The transition term feeding the state 
register corresponding to CE is inverted, the transition term using 
CE is inverted, and the CE state register is cleared on reset to 
establish CE as inactive. These same 3 changes would be required 
whenever an active-high state or output is required. The inverse 
operation is required to change an active high state bit to 
active-low. 


Larger state machines may also be implemented using this 
technique by distributing control to several smaller machines and 
using a single master machine to coordinate activities. This usually 
results in a higher performance design also since control signals will 
be located near the logic they affect, minimizing routing delays. 
Usually it is also easier to design and debug since each machine can 
be more easily understood and interactions between operations are 
minimized. 





© 1881 Асе! Corporation 


4-31 








In many cases the BPS methodology will result in a faster speed 
than a PLD design since the wide gating PLD delay is slower than 
the corresponding delay in an ACTEL FPGA. See Table 3 for a 


comparison of the delays for the designs in an Actel A1020-2 and 
Altera EPM5128-1™, 


Table 3. Performance Comparison Between А1020-2 and EPM5128-1 


A1020-2 Performance 
(Requires 19 modules out of 547 or 3.5%) 


ABCD to Y 6.55 ns (critical) 
Y to Reg 6.10 ns (critical) 
Reg SU 4.4 ns 

Clk to О 6.10 ns (critical) 
Q to CMREQ 6.10 ns (critical) 
Total 29.25 ns 


EPM5128-1 Performance 
(Requires 8 macros out of 128 or 6.25%) 


ABCD to Reg 12 ns 
Reg SU 6ns 
Clk to Q 1ns 
Q to Feedbk 1ns 
CMREQ 12 ns 
Total 32 ns 





Also note that the cost of the FPGA implementation will be much 
less than that of a PLD implementation since a smaller fraction of 
the device will be needed to implement the design. 


Summary 


BPS design is a technique for using the register-rich architecture of 
FPGAs to more easily design faster and cheaper state machines. It 
uses the familiar state machine design methodology but bypasses 
the state assignment step of the design. A ‘cook-book’ summary 
follows: 


1) Write state transition equations for each state. 


2) Assign each state to a separate register. Where possible, state 
bits should be made active-low to make it easier to construct 
transition terms in a single ACT 1 module. 








3) Next, output signals are taken directly from state outputs where 
possible. If the output signal must be active-high (it can almost 
always be made active-low by making sure the control signal is 
active-low at the destination), the state bit, associated with it 
can be made active-high by inverting that bit's transition logic 
and inverting that state’s input to any other transition term. 


4) Allactive-high state bits are reset on machine initialization and 
active-low states are preset on initialization. The initial state 
must be activated on initialization so it should be cleared if 
active-low or preset if active-high. 


Conclusion 


FPGAs register-rich and fast narrow gating structure and BPS 
design methodology allows faster and less expensive state machine 
implementations than PLDs. It’s easier too! 


FPGAs are Better for State Machines than PLDs 








Figure 1. Example State Diagram from 4-Channel DMA Controller 





4-33 





vera) 





әшцәвү эез$ ророзиз гс əN 


‘Аве $ 
јер JO $1еле! 9 pue ѕәјпрош | LOW 62 Seunbei uonejueurejduui Siu |. 





LAS EY 
MID 
: ela LNOD 
as 5 оо 
y 2398 d mm ke 
то 
v го хт : = 
|] | EE 
а 2 = О 
zo zt Ж = то 
ат го 
ent го 
OWHWO/ ps: È 
5 оо 
ATLND/ | | i ES 
| Y го 
= 
атола 2 Ge 
5 || E (ente TO 
то È 5 ae 
= a а імона 
Ltda ZO хол 16 
то x 25 
= го 
ogudd/s = 28 : 25 
x то wis хе је То 
Base а Сор 2- 
д a 
55 KE EE 
= 2 
EI È 
= то - 
ar го 


«mun 





4-34 





FPGAs are Better for State Machines than PLDs 











CONT 







/PBREQ 


MACK 


/CE 


/CLD 


RESET 


/S5 


This implementation requires 19 ACT 1 modules and 3 levels. 


Figure 3. Bit-Per-State Implementation 


























/SO 





/PBREQ 


MACK 


This implementation requires 19 ACT 1 modules and 3 levels. 


Flgure 4. Active-High CE Output 











4-36 





Раде Моде 


DRAM Controller 





Introduction 


The ACT™ 2 DRAM controller supermacro allows you to access 
up to 16 MBytes of memory space from two different channels. It 
contains automatic refresh circuitry and DRAM control logic. It 
can operate a 4 MBit DRAM in page mode for up to two thousand 
transfers starting from any point in a column. At the conclusion of a 
paged transfer, lock-out logic prevents any other accesses until all 
missed refreshes have been done. The supermacro uses just over 
13% of an A1280. The schematic symbol and architecture of the 
DRAM controller are shown in Figures 1 and 2. 





PAGE MODE 
DRAM 
CONTROLLER 


PADDLE 
PCNTLE 
PAGECY 


О [24 : 0] 
РА [23 : 0] 
СА [23 : 0] 





Figure 1. Schematic Symbol 





Operational Overview 


Address and Data Buses 


Three buses in the controller bring the two channel addresses. A 
third bus has the processor data to load the counters for paged 
operation. The processor can request a single or paged memory 


© 1891 Actel Corporatlon 


transfer via address decodes. Peripheral channels can make 
memory transfer requests from a DMA interface. 


Refresh Logic 


The two counters (U0, U1) time refresh requests through an 
up/down counter (udc). The udc serves as an intermediate counter 
to pass the requests to the memory arbitration controller. When 
the memory is busy for long periods, such as during page accesses, 
the udc stores refresh requests by incrementing each time a request 
is made. When the memory is free, the udc can request all the 
refreshes successively until the refreshes are caught up. Each те а 
refresh is complete the udc is decremented until it is cleared. 


Arbitration State Machine 


The arbitration state machine decides which channel has access to 
the memory and tells the memory timing control what type of 
access to begin. Its outputs are also used to select the source for the 
memory address and Read/Write source. 


Memory access requests are prioritized from highest to lowest in 
the following order: refresh, processor cycle, page cycle, and 
channel cycle. When the memory is idle, the highest priority 
request begins a cycle. According to the type of request, the 
arbitration logic moves through a sequence of states until waiting 
for the timing control to signal the end of the cycle. 


Memory Timing Control 


The memory timer issues the signals to control the operation of the 
DRAMs. It times the sequence and duration of the signals to 
conform to the requirements of the DRAMs and informs the 
arbitrator when the cycle is complete. 


The timer contains a counter for paged accesses which will operate 
paged transfers up to a limit of two thousand. 


Address Multiplexor 


The address multiplexor selects the appropriate address source for 
memory accesses and, under the control of the memory timer, 
drives the DRAMs with the row or column address. 


The multiplexor contains a register and a counter for paged 
accesses. Both are loaded by the processor. The register contains 
the address of the row for the page access and the counter has the 
beginning offset within the column. After each page access cycle, 
the column increments by one to supply the address for the next 
cycle. 


Bank and Read/Write 

The two most significant bits of memory address are used for the 
bank select. They are selected by multiplexors from the sources for 
memory addresses. The multiplexor outputs are then decoded by 
the memory timer to drive one of the four RAS lines. 


The processor loads a three-bit register with the page bank address 
and Read/Write line prior to initiating a page access. 


4-37 


EN 
— TON 

















REFRESH 
TIMING 
AND 
CONTROL 











MEMORY MEMORY 
ACCESS TIMING 
ARBITRATION CONTROL 


MEMORY 
CONTROL 








MEMORY 
ACCESS 
REQUEST 


PAGE MODE 
ADDRESS 
AND 
TRANSFER 
COUNT 

















MEMORY 
ADDRESS 
MULTIPLEXOR 


PROCESSOR 
DATA BUS 


MEMORY 
ADDRESS 


PROCESSOR 
AND 
CHANNEL 
ADDRESS 
BUSES 


Figure 2. Page Mode DRAM Controller Block Diagram 





4-38 

















Acts! 


— 


Four-Channel 


DMA Controller 








Introduction 


The ACT™ 2 DMA controller supermacro allows you to connect 
up to four devices to a common memory controller interface. The 
supermacro uses approximately 19% of an A1280. The schematic 
symbol and architecture of the DMA controller are shown in 
Figures 1 and 2. Each channel has a 24-bit register which is loaded 
by the processor with the starting memory address. 





FOUR-CHANNEL 
DMA 
CONTROLLER 


CA [23 : 0] 


PD [23 : 0] 
АВЕО АМАСК 
BREQ BMACK 
CREQ CMACK 

DMACK 





Figure 1. Schematic Symbol 





Operation Overview 


When a channel makes a request for a transfer, the controller 
makes a request to the processor to use the system bus. When the 
bus is granted, the address the channel is using (except for the two 
most significant bits which are used for bank select) is loaded into a 
counter driving the memory controller by a state machine. The 
state machine also issues a transfer request to the memory 
controller. The completed memory transfer is acknowledged to the 
requesting channel via the state machine which also increments the 
address in the counter and writes it back to the requesting channel’s 
Tegister. 


State Machine Description 


The four-channel memory transfer request lines are prioritized so 
that only one request is recognized at a time. The control of the 
DMA is handled by a state machine submacro containing 
additional logic for prioritization. A state graph of the state 
machine may be seen in Figure 3. 


Instate 50 it loops awaiting a request. A request causes a transition 
to S1 which issues a processor bus request and waits for an 
acknowledgement. After the bus is granted, the state moves to $3 
where the counter is loaded and a memory access is requested. 


The state machine loops on 53 until the transfer is complete and 
moves to 57 to increment the counter. Going next to 85 to load the 
counter value back into the channel’s register, the state changes to 
50 if there are no further requests from the channel. If there is 
another request from it, the machine loops through 54, 57, and 55 
following the memory transfer, counter increment, and register 
load sequence as described above. 


EIZO I ss. eee 


© 1991 Actel Corporation 





4-39 














РАТА 
BUS 














STATE 
MACHINE 


CONTROL 


REGISTER 





REGISTER 


REGISTER 
BANK SELECT 





BANK 
MULTIPLEXOR 


MEMORY 
ADDRESS 






ADDRESS 
MULTIPLEXOR 






COUNTER 


Figure 2. DMA Controller Block Diagram 





4-40 





Four-Channel ОМА Controller 








COUNTER LOAD, 
CHANNEL MEMORY REQ 





COUNT ENABLE 


REGISTER LOAD 








CHANNEL MEMORY REQ 


MEMORY TRANSFER ACK 





Figure 3. Four-Channel DMA Macro State Machine Control 








4-41 








4-42 


Using FPGAs for 


Digital PLL Applications 





In addition to purely digital applications, many designs use Field 
Programmable Gate Arrays (FPGAs) for DSP. We'll examine one 
such application, digital PLLs, to show various ways of 
implementing PLL designs using FPGAs. 


Pulse Steal PLL 


In telecommunications applications it isoften desirable to generate 
a digital signal which is locked to an incoming signal (INPUT) and 
is some multiple of its frequency. A drawing of a pulse steal PLL, 
which is a simple way to generate such a signal, may be seen in 
Figure 1. Note that the design contains an ordinary oscillator, but 
no Vco. Except for the crystal, the entire design will operate in an 
FPGA. OSC is a multiple (К) of the input frequency (INPUT). 


Note the frequency relationship that holds at points A and B in the 
figure where: 


OSCHK*M) = INPUT/N = Comparison Frequency (1) 


The technique is based on selecting a reference oscillator 
frequency that is slightly higher than OSC. This frequency 
(OSC + A) should be chosen so that: 


1/Comparison Freq. - (K*MY(OSC + A) = .5*(1/OSC) (2) 


The right side of Equation 2 equals one half the period of the 
reference oscillator. 


The reference oscillator frequency delta will cause point B (the 
detector flip-flop D input) to begin to precede point A (the detector 
flip-flop clock input) by half a period. When the edge of the D input 
is sufficient, the detector will clock true and begin a pulse train 
through the two deglitching flip-flops. The output of the second of 
these clears all three flip-flops and steals a pulse by disabling the 
divide by K output. Stealing the pulse puts point B behind A until 
the reference oscillator delta can move it ahead by one period, 
repeating the cycle. Points A and B are always within one half a 
cycle of each other. 


The circuit allows the frequency of the output signal to be selected 
simply by adjusting the values of the dividers K and M. The lock 
range of the loop is given by the following: 


Lock Range = + - ((OSC + AYOSCJINPUT (3) 











REFERENCE 
OSCILLATOR 





INPUT 


DIVIDE 
BY N 


DIVIDE DIVIDE 
BY K BYM 
B 
Ё ENABLE 


OUTPUT 










DETECT DEGLITCH 





Figure 1. Block Diagram 











© 1991 Actel Corporation 





Notes: 





Notes: 





Notes: 


Notes: 


Notes: 





ACTEL CORPORATION DIRECT SALES OFFICES 


UNITED STATES 
955 E. Arques Avenue 6525 The Corners Parkway 
Sunnyvale, CA 94086 Suite 400 


Tel: (408) 739-1010 Norcross, GA 30092 


8130 McFadden Avenue Tel (404)409-7888 
Suite 109 1740 Mass Avenue 
Westminster, СА 92683 Boxborough, МА 01719 
Tel: (714) 373-4488 Tel: (508) 635-0010 


3800 N. Wilke Road, Suite 300 
Arlington Heights, IL 60004 Intec 2, Unit 22 
Tel: (708) 259-1501 


2350 Lakeside Blvd., Suite 850 
Richardson, TX 75082 
Tel: (214) 235-8944 


ENGLAND GERMANY 

Actel Europe Ltd. Actel Central Europe 
Dingolfinger Strasse 2 
Wade Road W-8000 Muenchen 80 
Basingstoke Tel: (89) 41.80.0078 
Hants RG24 ONE 

Tel: (256) 29.209 





DOMESTIC REPRESENTATIVES 


ALABAMA 

Rep Ine. isses (205) 881-9270 
ARIZONA 

Luscombe Engineering сине (602) 949-9333 
CALIFORNIA 


.. (818) 591-1655 
.. (714) 261-2123 
... (619) 278-4950 


Centaur Corporation (Calabasas) . 
Centaur Corporation (Irvine) ....... 
Centaur Corporation (San Diego) 















12 Inc. (Roseville) ..... 916) 784-0530 
12 Inc. (Santa Clara) .. (408) 988-3400 
COLORADO 

Thom Luke Sales, Inc. ........ .. (303) 649-9717 
CONNECTICUT 

CompRep Associates ...................................................... 203) 230-8369 
FLORIDA 

Sales Engineering Concepts (Altamonte Springs)............ (407) 830-8444 
Sales Engineering Concepts (Deerfield Beach) ............... (305) 426-4601 
GEORGIA 

RED INGE ла er trt ЛК К aed e P Agen 404) 938-4358 
ILLINOIS 

Carlson Electronic Sales Associates ............................... (708) 956-8240 
INDIANA 

Bailey's Electronic Sales & Technology ........................... (317) 848-9958 
IOWA 

Carlson Electronic Sales Associates ................................ (319) 378-1450 
KANSAS 

DÉE:EIeCITOnI CS: ee IR (316) 683-6400 
MARYLAND 
A petet (301) 544-4100 
MASSACHUSETTS 

CompRep Associates сине (617) 329-3454 
MICHIGAN 

Electronic Sources, Inc. .................................................... (313) 227-3598 
MINNESOTA 

Gibb Technology Sales ... dolorante (012):835:3370 
MISSOURI 


John G. Macke Со. .. ... (314) 432-2830 











NEW JERSEY 

Nexus ........... ... (201) 947-0151 
NEW YORK 

L-MAR Assoclates (Fairport) .... ... (716) 425-9100 
L-MAR Associates (Glen Falls) . (518) 798-6225 
L-MAR Associates (Lackawanna (716) 826-1301 
L-MAR Associates (Marcellus) .. (315) 673-1325 





L-MAR Assoclates (Troy) 
NORTH CAROLINA 
Верас o poete e Ы (919) 469-9997 


OHIO 
J.R. Thornberry Co. (Chagrin Falls) ..... 
J.R. Thornberry Co. (Dublin) 


.. (518) 235-0962 





.... (216) 247-0060 
... (614) 792-5171 








OREGON 

БЕУ м tc enamine Rea lla (503) 629-8555 
PENNSYLVANIA 

Omega Electronic Sales (Trevose) .................................. (215) 244-4000 
J.R. Thornberry Co. (Bridgeville) ...................................... (412) 745-8441 
TENNESSEE 

сэн өтөл" (615) 475-4105 
TEXAS 


... (512) 794-9971 


OM Associates (Austin) : 
.. (713) 789-4426 


OM Associates (Houston) .. 








OM Associates (Richardson) .................................... » .. (214) 690-6746 
UTAH 

First: Sources anna eoe xvn deo rien (801) 943-6894 
WASHINGTON 

1 ДА. ыыы рек ын ына аланы (208) 482738555 
WISCONSIN 

Carlson Electronic Sales Associates ............................... (414) 476-2790 
CANADA 


Clark-Hurman Associates (Ontario-Brampton) ... 
Clark-Hurman Associates (Ontario-Nepean) .. 
Clark-Hurman Associates (Quebec) ........... А 


... (416) 840-6066 
.... (613) 727-5626 
- (514) 426-0453 





DOMESTIC DISTRIBUTORS 





WYLE LABORATORIES 










ARIZONA 

Phoenix man ы На ына ars (602) 437-2088 
CALIFORNIA 

Calabasas .. (818) 880-9000 
Irvine .......... .. (714) 863-9953 
Rancho Cordova .. (916) 638-5282 
San Diego .. (619) 565-9171 
Santa Clarisa . (408) 727-2500 
COLORADO 

TROEON ie RE Po o NE 11. . (303) 457-9953 
MASSACHUSETTS 


Burlington: НАК gite tee ege (617) 272-7300 








OREGON 

Beaverlon oral eed найд наан (503) 643-7900 
TEXAS 

Austin .............. en нап УО Ta ОКК 22 (512) 345-8853 
Houston ....... . (713) 879-9953 
Richardson .. ... (214) 235-9953 
ОТАН 

West Valley City ... (801) 974-9953 
WASHINGTON 

Redmond: nenn aa depo (206) 881-1150 


DOMESTIC DISTRIBUTORS (continued) 


PIONEER STANDARD ELECTRONICS 











CALIFORNIA 

Irvine . шан ... (714) 753-5500 
Woodland Hills (818) 883-4640 
CONNECTICUT 

SRON rca elia (203) 929-5600 
ILLINOIS 

AGdiSQn аллаа далан прва SR (708) 495-9680 
INDIANA 

indianapolis: tara gianna te dee (317) 573-0880 
MASSACHUSETTS 

Lexington .. (617) 861-9200 
MICHIGAN 

отапа eost ood b Cu de ga (313) 525-1800 
MINNESOTA 

Eden Prairie они rrt He ee (612) 944-3355 
NEW JERSEY 

Fairfield . 00) 1575-9510 
PIONEER TECHNOLOGIES 

ALABAMA 

Huntsville ... (205) 837-9300 
CALIFORNIA 

Gan Jose. iss vene tet toii ae etiem (408) 954-9100 
FLORIDA 


Altamonte Springs . 
Deerfield Beach 


(407) 834-9090 
(305) 428-8877 








GEORGIA 

БШШ en oett an (404) 623-1003 
ZENTRONICS (Canada) 

ALBERTA 

Calgary (403) 295-8838 














NEW YORK 
Binghamton (607) 722-9300 
Fairport (716) 381-7070 












Woodbury (516) 921-8700 
OHIO 

Cleveland Да. : . (216) 587-3600 
Dayton ........... (513) 236-9900 
PENNSYLVANIA 

Pittsburgh . nai ra ateo ccoo e (419) 782:0900 
TEXAS 


(512) 835-4000 
- (214) 386-7300 

















Houston ...... n... (713) 495-4700 
WISCONSIN 

Brookfield... dia Pius Seat ви ati oa (414) 784-3480 
MARYLAND 

Gaithersburg ... (301) 921-0660 
NORTH CAROLINA 

Charlotte... (704) 527-8188 
Durham (919) 544-5400 
PENNSYLVANIA 

Horsham ser ee (215) 674-4000 
ONTARIO 


(416) 564-9600 
- (613) 226-8840 





(514) 737-9700 


Ste. Foy................. - (418) 654-1077 





INTERNATIONAL DISTRIBUTORS 


Edmonton …… ... (403) 484-1669 
BRITISH COLUMBIA 

ІСААКА ЬАМ (604) 273-5575 
МАМІТОВА 

Winnipeg 2. lerici (204) 694-1957 
AUSTRALIA 

Reptechnic (Neutral Bay, NSW) .......................................... (2) 953.9844 
AUSTRIA 

Codico G.m.b.H. (Perchtoldsdorf) .................................... (222) 86.24.28 
BELGIUM 

Аса! Auriema NV/SA (2аметіет)........................................... (2) 720.5983 
DENMARK 

Nordisk Electronik AS (Herlev) „не (42) 84.20.00 
EGYPT 

SEE (Cain): en rta cippo PRU FERRI Hn (2) 665.948 
ENGLAND 

Gothic-Crellon Lid. (Berkshire-Wokingham)..................... (0734) 78.88.78 
Manhattan Skyline Ltd. (Berkshire-Maidenhead) .............. (0628) 75.85.1 
Microprocessor & Memory Dist. (Berkshire-Reading) ....... (734) 31.32.32 
FINLAND 

OY Fintronic AB (Helsinki) ーーーーーーー (0) 69.26.022 
FRANCE 


A2M (Le Chesnay Cedex) ... 
SCAIB S.A. (Meylan-Zirst) 
SCAIB S.A. (Rungis Cedex) . 


.... (39) 5491.13 
.. (76) 90.22.60 
... (1) 46.87.2313 















GERMANY 

REIN Elektronik G.m.b.H. (Nettetal) .... ... (2153) 733.0 
HONG KONG 

Twin-Star Trading Co. (Yau Топо) ..................................... (852) 346.9085 
INDIA 

Benchmark Systems (Singapore) ....................................... (65) 299.1605 


ISRAEL 

AS TAO (Herziia) as xs mtt (52) 58.33.55 
ITALY 

LASI Elettronica S.p.A. (Milan) «anne (2) 66.10.1370 
JAPAN 


Innotech Corporation (Yokohama-Shi) 
Matsushita Electronics Corporation (Kyoto) .... 


.... (45) 474.9037 
.. (75) 951.8151 











KOREA 

Eastern Electronics, Inc. (Ѕеоџі)............................................ (2) 553.2997 
NETHERLANDS 

Transfer B.V. (Enschede) ле (53) 33.03.36 
NORWAY 

Nortec Electronics A/S (Hvalstad) ... ... (2) 84.62.10 
SOUTH AFRICA 

ASIC Design Services (Sandton) ........................................ (11) 786.8144 
SPAIN 

Semiconductores S.A. (Barcelona) ..................................... (3) 217.23.40 
Semiconductores S.A. (Madrid) .............. ROUEN (1) 7422313 
SWEDEN 

Nortec Electronics AB ......................................................... (8) 705.18.00 
SWITZERLAND 

Actel Sulzer (Bruegg) ranna ae eem (32) 53.63.75 
TAIWAN 

Aexcel Technology Corporation (Taipei) ............................. (2) 712.7321 


492.2 


Act! с A 


Distributed by: 


MICROPROCESSOR 
& MEMORY 


DISTRIBUTION LTD. 


3 Bennet Court, 
Bennet Road, Reading, 
Berkshire RG2 0QX. 





Telephone: (0734) 313232 с Ad 59 
Fax: (0734) 313255 Telex: 846669 N SAT ¿qe 





MMD 
THE SPECIALIST TECHNICAL DISTRIBUTOR 








Actel Corporation le 3 
955 East Arques Avenue E 
Sunnyvale, CA 94086 | 


408.7391010. * Ф $ 
Da 


mM ZZ ae 
Täehnical Hotline 8002624060 ° ^ - 1 — - 5172021-0 
SE : 


222 





